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Fermat’s Theorem—At Last! 


f at first you don’t succeed, try Hecke rings again. 

Not exactly words to live by, perhaps, but that’s what it took- 
for Princeton University’s Andrew Wiles to complete his astonish- 
ing tour de Fermat. Experts now agree that Wiles has proved 
Fermat’s Last Theorem, the enticingly simple assertion that the 
equation ©” + y” = z” has no solutions in positive integers z, y, 
and z for any exponent n greater than 2. 

Less than a year after discovering a flaw in his original proof, 
Wiles released two manuscripts containing a revised approach that 
not only avoids the problem area, but also makes for a simpler, 
more readable proof. Experts were quickly convinced that Wiles’s 
new proof is correct, and the two papers have been published not 
in, but as, the May 1995 issue of the Annals of Mathematics. One, 
titled “Modular elliptic curves and Fermat’s Last Theorem,” is sub- 
stantially the paper that Wiles first presented in 1993. The other, 
“Ring-theoretic properties of certain Hecke algebras,” co-authored 
with Richard Taylor of Cambridge University, fills in a crucial 
detail that yields the historic result. 

The Princeton mathematics department celebrated the proof’s 
publication with a public lecture, in which Wiles gave a personal 
account of his eight-year quest to fulfill a boyhood dream. 

Fermat’s Last Theorem has fascinated mathematicians, both 
young and old, for generations. Much of its allure stems from a 
comment Fermat made when he jotted the observation down in the 
late 1630s. He had been reading Bachet’s translation (into Latin) 
of the ancient Greek mathematician Diophantus’s treatise on arith- 
metic. Next to a passage that (in effect) presented the general solu- 
tion of the equation x? + y” = z*, which guarantees the existence 
of infinitely many “Pythagorean triples,’ Fermat wrote out his 
claim about the absence of solutions for exponents greater than 2. 
Then he added “TI have a remarkable proof, which unfortunately the 
margin is too small to contain.” 

‘“‘We don’t know exactly when Fermat wrote these words,” Wiles 
told the audience in Princeton. “We’ll probably never know what 
he meant exactly when he wrote them. But surely he would have 
been astounded at the impact that these few, perhaps casually 


**T have a remarkable 
proof, which unfortu- 
nately the margin is 
too small to contain.” 


—Pierre de Fermat 
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It was not unusual for Fermat 
to claim a result without 
presenting its proof. 
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written words were to have on the history of mathematics.” 

It was not unusual for Fermat to claim a result without present- 
ing its proof. But he also made other observations for which he did 
not claim any proof—a distinction he regarded as important. 
Fermat, in fact, was among the first to insist that mathematical 
statements be completely proved, and not just “verified” with a 
handful of examples. Many of Fermat’s informal observations 
turned out to be wrong (see Box, “Fermat’s Folly,” page 9), but of 
those that he said he had proved, later mathematicians found proofs 
of every one—except, until now, the “Last Theorem” (which is how 
it got that name). 

Fermat’s unerringness adds to the theorem’s mystique: Did he 
really have a proof, or was he mistaken (but lucky)? Or was he 
playing a brilliant practical joke on posterity? 

Wiles came across the story of Fermat’s Last Theorem as a boy. 
It fascinated him. “I spent some of my childhood trying to solve it, 
on the assumption that Fermat had had a solu- 
tion, and that I didn’t know too much less than 
he would have known,” Wiles recalls. Later, 
he learned of the 19th-century work of 
Edouard Kummer, who introduced powerful 
methods capable of proving Fermat’s Last 
Theorem for individual exponents. Wiles tried 
Kummer’s approach as well—with no more 
success than thousands of other mathematicians over the last hun- 
dred-plus years. Realizing that his boyhood dream could turn into 
a quixotic quest, Wiles turned to more mainstream number theory 
for his graduate work. “Like most professional mathematicians, I 
decided at this point to put [Fermat’s Last Theorem] aside,’ he 
says. 

Indeed, for the last hundred years, Fermat’s Last Theorem has 
been largely the province of amateurs and cranks. Kummer’s work 
and the discoveries of other 19th-century mathematicians led num- 
ber theorists in different directions that were seen as more prof- 
itable. One of those directions—the one Wiles chose—was the 
study of elliptic curves. 

It was a propitious choice. In 1986, Ken Ribet of the University 
of California at Berkeley proved the startling fact that Fermat’s Last 
Theorem is a consequence of an assertion in the theory of elliptic 
curves, known as the Taniyama—Shimura conjecture. Ribet’s result 
brought Fermat’s Last Theorem back into the mainstream of mod- 


ern mathematics: The Taniyama—Shimura conjecture had been a 
central problem in number theory for more than two decades (see 
Box, “The Taniyama—Shimura Conjecture Made Simple,” 
page 11). 

Wiles seized the opportunity to return to his quest. Ribet “had 
finally given me the excuse to put my professional expertise to 
work against my childhood passion,” he says. “And immediately I 
heard about it in the summer of °86, I knew I would never let it 


99 


go. 
Wiles, who is soft-spoken to begin with, quietly pursued the 


Taniyama—Shimura conjecture for the next seven years. Although 
the conjecture was universally believed to be true, no one had a clue 
how to prove it. Wiles himself set out “without any idea how to 
proceed,” he recalls. But doing mathematics is often that way, he 
observes. “It’s like entering a darkened mansion. You enter a 
room, and you stumble months, even years, bumping into the fur- 
niture. Slowly you learn where all the pieces of 
furniture are, and you’re looking for the light 
switch. You get to know where all the furniture 
is, and finally you find the light switch. You turn 
it on, and the whole room is illuminated. Then 
you go on to the next room, and repeat the 
process.” 

Fermat’s Last Theorem does not require the full 
Taniyama—Shimura conjecture, which is an assertion about all 
elliptic curves. It’s enough to prove the conjecture for a specific 
subclass, known as the semistable elliptic curves. Wiles slowly 
developed a strategy for attacking such subcases of the conjecture, 
taking up and extending techniques that he and others had devel- 
oped in the theory of elliptic curves. As he explained in his 
Princeton lecture, the proof ultimately involved four Key steps: (1) 
replacing elliptic curves with abstract objects known as Galois rep- 
resentations, for which there is a “language” known as deformation 
theory (largely created by Barry Mazur at Harvard University); (2) 
reducing the problem to what number theorists call a class-number 
formula; (3) proving that formula; and (4) tying up loose ends that 
arise, oddly enough, because the powerful methods don’t work in 
the simplest cases. 

The key breakthrough came in the spring of 1991, with the real- 
ization that the problem boiled down to a class number formula. In 
general, a class number formula relates the size of two seemingly 


The key breakthrough came in 
the spring of 1991. 
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unrelated algebraic objects (often groups). For Wiles, the two 
objects were, first, a “deformation ring” coming from Mazur’s 
deformation theory, and, second, a “Hecke ring” associated with 
modular forms. (Modular forms are what the Tantyama—Shimura 
conjecture relates to elliptic curves.) 

“This was tremendously exciting for me,” says Wiles. “Not only 
was it a complete surprise, but it was back to a field that I knew 
most about.” 

Indeed, the reduction to a class number formula was similar to 
the approach Kummer had taken 150 years earlier, but in some 
ways much simpler. “I was convinced now that the proof was 
around the corner,” Wiles recalls. ‘It was,’ he adds, “but the cor- 
ner was a bit longer than I anticipated.” 

By the end of 1992, Wiles was confident he had proved the class 
number formula. Then, in May of 1993, he found an elegant argu- 
ment that tied up the loose ends. “At this point, I thought that was 
it, came down from my study, [and] excitedly told my wife,’ Wiles 


Princeton professors. From left to right: Peter Sarnak, Nicholas Katz, and Simon Kochen with Wiles at a celebratory 
champagne reception in the Princeton mathematics department on June 28, 1993. (Photo courtesy of Denise Applewhite 
and Princeton University. ) 
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says. The next month he made his historic announcement at a 
number theory conference in Cambridge, England, where he had 
received his doctorate in 1977. 

The mathematical world rejoiced that its most famous problem 
had finally been solved. Beyond the proof’s significance for 
Fermat’s Last Theorem, number theorists were amazed by the pow- 
erful mathematical machinery that Wiles had built in attacking the 
Taniyama-Shimura conjecture. It was as if he had begun with an 
abacus and produced a supercomputer. 

In the months that followed, however, experts poring over 
Wiles’s work discovered a flaw in his proof of the class number for- 
mula, one that Wiles could not immediately correct. The gap did 
not invalidate the main part of Wiles’s work—his methods still 
proved the Taniyama—Shimura conjecture for large classes of ellip- 
tic curves—but it broke the chain of logic for the class of elliptic 
curves that are the key to Fermat’s Last Theorem. By December of 
1993, Wiles acknowledged that the gap was a serious problem that 
would take an indeterminate amount of time to correct. Some num- 
ber theorists wondered if the wait might be another 350 years. 

The problem with the proof lay in the construction of a compli- 
cated mathematical structure called an Euler system. Euler systems 
had only recently been developed, primarily by Victor Kolyvagin at 
the Steklov Institute in Moscow. They seemed ideal for proving the 
kind of class number formula Wiles had identified. Wiles had first 
tried to prove the formula by a “direct”’ approach, that involved 
“sluing’” Hecke rings together, but in 1991 he decided to abandon 
the direct approach in favor of the more elegant machinery of Euler 
systems. But Euler systems turned out not to be as well suited to 
Wiles’s class number formula as he had thought. The construction 
his 1993 result used contained a technical flaw—one that couldn’t 
be fixed. 

“The dragon was showing signs of waking up,” Wiles remem- 
bers. In the spring of 1994, Wiles detailed his ideas in a seminar at 
Princeton and began working with Richard Taylor, his former grad- 
uate student, now at Cambridge University. Wiles gradually 
became convinced that the Euler-system approach could not be 
made to work, and decided to return to his original, “direct”’ attack 
on the class number formula. The roadblock that had steered his 
detour into Euler systems was still there: a “local intersection” 
property for Hecke rings that seemed to defy proof. By August, 
Wiles and Taylor had reached an impasse. “I was beginning to be 
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Because the Taniyama-—Shimura 
conjecture is so important to 
the theory of elliptic curves, 


resigned at that point to a long haul,” Wiles recalls. 

Taylor suggested they take one more look at Euler systems. 
That did it. Although Euler systems themselves did not work, and 
do not appear in the final proof, thinking through why they didn’t 
work gave Wiles the last insight he needed. 

“I was taking one last look at the Euler system, and tried to for- 
mulate exactly what was wrong with it,’ Wiles told the Princeton 
audience. “Suddenly, on September 19, last year, I had this won- 
derful revelation. I saw in a flash how to glue together certain of 
the Hecke rings. This was the missing key to the old approach to 
the problem that I’d taken up to 1991, but had then abandoned in 
favor of Euler systems. My problems were over. I was so amazed 
by this that for several hours I put it down and did some adminis- 
trative chore, and then returned to it to check that it was still there. 
I kept doing this. It was so simple and so elegant that at first it 
seemed too good to be true. In fact, it was too 
good to be false.” 

The rapid acceptance of Wiles’s new proof 
bears that out. By mid-October he had a new 
manuscript, consisting mainly of the work he 
had presented the previous year in Cambridge, 


number theorists are eager to but with the section on Euler systems replaced 


understand the techniques that 


underlie Wiles’s work. 
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by the simpler Hecke ring approach. The proof 
sped through the review process. This time, the 
experts found no gaps. 

Mathematicians are quick to point out, how- 
ever, that acceptance and publication don’t by themselves establish 
that a proof is correct. The professional journals are littered with 
lapses of logic that have slipped by authors, editors, and referees 
alike. The real reckoning will come as Wiles’s proof is picked apart 
in scores of seminars around the world. The motivation for doing 
so goes beyond just wanting to verify the proof. Because the 
Taniyama—Shimura conjecture is so important to the theory of 
elliptic curves, number theorists are eager to understand the tech- 
niques that underlie Wiles’s work. Indeed, Fred Diamond, a former 
student of Wiles now at Cambridge University, has already extend- 
ed the proof of the Taniyama—Shimura conjecture to a larger class 
of elliptic curves, and many now feel the complete conjecture is 
within reach, a prospect that was only a dream just three years ago. 

Which problem will next capture the imagination of generations 
to come? “I’m very fortunate that I could spend my professional 


life, or at least a big part of it, pursuing a childhood passion,” Wiles 
says. Perhaps someone somewhere will pen a tantalizing comment 
into the margin of Wiles’s issue of the Annals. 


Not a single Fermat number 
beyond the first four has ever 
been shown to be prime. 
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23 partially factored 
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Figure 1. The status of the Fermat 
numbers with exponents up to 22° has 
been determined, but only a handful 
have been fully factored. 
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Goro Shimura. Photo courtesy of 
Orren Jack Turner and Princeton 
University. ) 
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The Taniyama-Shimura 
conjecture links two 
seemingly disparate 
mathematical fields: 

the algebraic subject of 
elliptic curves and the 
analytic subject of 
modular forms. 
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—200 
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Figure 2. The elliptic curve y* = x(x — 3)(x + 32) has many rational points. A line connecting any two of them inter- 


sects a third. This makes it possible to generate all rational points out of just a few. (Figure from What’s Happening in 
the Mathematical Sciences, Volume 2, page 5.) 
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ee 
Figure 1. “The Eight-Fold Way.” Group 
theory meets geometry in the complex 
symmetries of this work by mathemati- 
cian—sculptor Helaman Ferguson, com- 
missioned for the Mathematical Sciences 
Research Institute in Berkeley, California. 
(Figures 1, 2, and 4 from Helaman 
Ferguson: Mathematics in Stone and 
Bronze, Meridian Creative Group, Erie, 


PA, 1994.) 
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Figure 2. “Three Knots.” The two trefoil knots (bottom pair) cannot be 
deformed into each other, whereas the two figure-eight knots (top pair) 
can. Topologists use invariants to tell when shapes are the same or differ- 
ent (see text). 


A Tale of Two Theories 


athematics and theoretical physics have quite different per- 
LS, laren Mathematics stresses rigor and proof, while 
physics relies more on intuition and heuristic argument. But at the 
same time, the two disciplines have a lot in common. Each plays 
off the other, swapping insights, techniques, and points of view. In 
many ways, math and physics can be likened to a long-married cou- 
ple telling a story at a dinner party, each completing the other’s sen- 
tences. Their banter may lapse occasionally into bickering, but 
more often the back and forth is a benefit both ways. 

A recent case in point is a stunning discovery by physicists 
Edward Witten at the Institute for Advanced Study in Princeton and 
Nathan Seiberg at Rutgers University. Working in quantum field 
theory, which aims to explain the whys and wherefores of elemen- 
tary particles, Witten and Seiberg came up with a new approach to 
some traditionally intractable problems in physics. But their dis- 
covery has also led to a new way of looking at certain problems in 
four-dimensional geometry, one that vastly simplifies methods 
mathematicians had developed over the last 15 years. 

“I haven’t had this much fun since ... I can’t remember!” 
exclaims Cliff Taubes of Harvard University, one of the mathe- 
maticians who’s been scooping up the gold nuggets that the physi- 
cists’ methods have revealed. 

What insights could particle physics contribute to understanding 
four-dimensional geometry? Most people expect things would run 
the other way. Even Einstein’s theory of relativity, with its mix of 
space and time, seems more a client of, than a supplier to, the math- 
ematics of the fourth dimension. But insiders know otherwise. 
Physics and geometry are intimately intertwined. 

The common theme is symmetry. Geometry can be described as 
the study of properties that don’t change when you move objects 
around, such as the angles of a triangle or the lengths of its sides. 
A geometric object is said to be symmetric if some motion lands it 
back on itself—for example, rotating a square by 90 degrees or flip- 
ping it across one of its diagonals. For any geometric object, the 
set of all such motions turns out to be a group (see “Are Group 
Theorists Simpleminded?,’ pages 82-89), called the figure’s sym- 
metry group (see Figure 1). 

Physicists realized early this century that many of their conser- 


Edward Witten. (Photo courtesy of 
Princeton University.) 
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Every conservation 
law is equivalent to 
a symmetry. 
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vation laws stem from symmetries in the apparent structure of the 
universe. For example, the notion that the universe is invariant 
under translation or rotation—that is, the premise that the results of 
an experiment don’t depend on the location of the laboratory or the 
direction the equipment faces—turns out to imply the well-known 
conservation laws for momentum and angular momentum, and 
vice versa. The German mathematician Emmy Noether proved 
this holds in general: Every conservation law is equivalent to a 
symmetry. 

Noether’s theorem was a boon for the burgeoning field of parti- 
cle physics. (It’s hard to believe the electron is less than a hundred 
years old—J.J. Thomson pinned it down in 189’7—and that the neu- 
tron, which accounts for more than half the mass of our planet, is 
barely eligible for Social Security.) By the 1960s, theorists had 
amassed many more conservation laws, including the conservation 
of electric charge, baryon number, “strangeness,” and isospin. 
Each has an associated symmetry group, and physicists were quick 
to pick up the basics of group theory—much to the delight of math- 
ematicians, who had invented the subject for completely different 
purposes. 

Group theory arose in the study of algebraic equations and what 
it takes to solve them, but mathematicians soon found groups to be 
surprisingly ubiquitous. (The mathematician André Weil, whose 
office at the Institute for Advanced Study is across the hall from 
Witten’s, once remarked, “When in doubt, look for the group.’’) 
One place groups crop up is in the study of differential manifolds— 
abstract spaces that look “locally” like familiar (to mathematicians, 
at least) n-dimensional euclidean space, but whose “global” struc- 
ture can be unimaginably complex. 

Actually, not all differential manifolds are abstract. The surface 
of a sphere is one concrete example: For all practical purposes, an 
acre of land is as flat as the euclidean plane, but we all know the 
earth—globally—is round. The surface of a donut (which mathe- 
maticians call a torus) is another familiar manifold. Both examples 
illustrate what mathematicians call ‘“compact’’ spaces. 
Compactness is among the subtler concepts in topology, but it’s rel- 
atively easy to explain. Imagine an observer stationed at every 
point of a topological space, each reporting on some tiny neighbor- 
hood around itself. The space is compact if you can throw away all 
but a finite number of these reports and still have a complete “sur- 
vey” of the space. The usual euclidean plane is not compact—its 


infinite extent precludes any finite description—but the sphere and 
the donut are compact (though the proofs are surprisingly more dif- 
ficult than you might expect). 

One of the major problems in twentieth-century mathematics 
has been to classify all compact spaces in dimension three and 
higher. The goal is to find, for each dimension, a complete list of 
compact manifolds, such that every other manifold of the same 
dimension is merely a distorted copy of one (and only one) mani- 
fold on the list. Group theory plays a role both in constructing 
examples of manifolds and in classifying them. 

In dimension one, the answer is easy: The list consists of the cir- 
cle and nothing else. The two-dimensional case was settled in the 
nineteenth century (well before the notion of compactness had been 
formalized!). The list consists of the sphere and the sphere with a 
finite number of “handles’’ (the torus is a sphere with one handle; 
see Figure 3). 

In three dimensions, there is a list, but it’s not known to be com- 
plete. The conjecture, due to William Thurston, now director of the 
Mathematical Sciences Research Institute at the University of 


X 


Figure 3. Two-dimensional manifolds are classified by the number of “handles” 
on each. 
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Figure 4. “Thurston’s Hyperbolic 
Knotted Wye I.” The abstract beauty 
of three-dimensional manifolds is made 
tangible in this 1500-lb Carrara mar- 
ble sculpture by Helaman Ferguson, 
commissioned for the Geometry Center 
in Minneapolis, Minnesota. The sur- 
face texture consists of carved hemi- 
spherical voids suggesting geodesics in 
a model of hyperbolic space. 
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California at Berkeley, is that every three-dimensional 
manifold is stitched together in a prescribed manner out 
of certain prescribed pieces, most of which have a 
“hyperbolic” structure (see Figure 4). 

In four dimensions, there’s not even a conjecture for 
the classification problem. “We have no clue what a 
complete list of four-dimensional manifolds should con- 
tain,’ says Taubes. There are three difficulties: It’s hard 
to find potentially new examples of four-dimensional 
manifolds; it’s hard to tell whether such an example is 
really different from previously known manifolds; and it 
can be even harder to tell if it’s actually the same as a 
previously known manifold. 

The last two questions may sound the same, but 
they’re not. The main tool mathematicians use to dis- 
tinguish one manifold from another is to associate man- 
ifolds with algebraic quantities called invariants. These 
are numbers, polynomials, or other algebraic entities— 
especially groups—that don’t change when the space is 
deformed. For example, in two dimensions, the number 
of “handles”’ is an invariant: No matter what you do to 
a rubber donut, short of biting into it, you can’t get rid 
of the hole. (Mathematically, this invariant is defined as 
the maximum number of closed curves you can draw on 
the surface without creating disconnected regions. On the sphere 
this number is 0—every closed curve leaves an “inside” and an 
““outside’’—while for the torus it’s 2 (see Figure 5). The number is 
always even; half its value is the number of “handles.” ) 

By computing and comparing invariants, mathematicians can 
prove two manifolds are different: If they differ for even one 
invariant, they are not the same (see Figure 2, page 14). The con- 
verse, however, may not be true: It’s entirely possible that two 
manifolds may share identical invariants but nonetheless be differ- 
ent. The question is, how many invariants do you need to know 
before you can conclude two manifolds are the same? 

In two dimensions there’s no problem: The only pertinent 
invariant is the number of handles. If two two-dimensional mani- 
folds have the same number of handles, then they are in fact the 
same. But in higher dimensions, the answer is far from clear. If 
mathematicians are already in possession of all the pertinent invari- 


ants, they don’t yet know it (that is, there’s no proof). 


The trouble is that invariants are hard to get hold of, 
especially when the spaces they apply to are so abstract. 
That’s what makes Seiberg and Witten’s discovery so 
exciting to mathematicians: Their work in quantum field 
theory has led to a new and simpler way of computing 
invariants for four-dimensional manifolds. 

The story actually begins around 1864, when James 
Clerk Maxwell wrote down his famous equations of elec- 
tromagnetism. Maxwell’s equations describe electrody- 
namics in terms of “fields” —electromagnetic waves that 
permeate space. By 1900, theorists had found that 
Maxwell’s equations could be simplified by introducing 
something called a vector potential, which neatly encap- 
sulates the information from which the electric and mag- 
netic effects can be derived. The vector potential is a 
mathematical structure imposed on the four-dimensional 
manifold of space-time. It is subject to certain equations 
that describe properties of the underlying manifold. 

Though mathematically pleasing, the vector potential 
seemed mysterious and unphysical. In particular, it 
appears in regions where there is no magnetic field, such as the 
exterior of a solenoid. That mystery was dispelled with the advent 
of quantum mechanics in the 1920s and the discovery of what is 
known as the Aharanov-Bohm effect, which describes the behavior 
of electrons traveling around a solenoid. The vector potential 
explains nicely how an electron’s wave function can experience 
magnetic effects without a magnetic field. 

The quantum version of Maxwell’s theory came to be known as 
quantum electrodynamics, or QED (the same abbreviation mathe- 
maticians use for quod erat demonstrandum). Theorists gradually 
came to recognize a common theme underlying both classical and 
quantum theories: assigning a group to each point of four-dimen- 
sional space-time. In each case, the group consists simply of the 
rotations of a circle. The theory that emerged is called abelian 
gauge theory. The key word is “abelian.” It means that the group 
law is commutative—it doesn’t matter whether you rotate first by 
20 degrees and then by 30, or first by 30 and then by 20. In the ver- 
nacular of mathematical physics, Maxwell’s equations are a “‘clas- 
sical” abelian gauge theory, while QED is a quantum abelian gauge 
theory. 

The success of classical and quantum electrodynamics led 
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Figure 5. Closed curves on a sphere 
and torus. 
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theorists to wonder what might happen if they changed the group. 
One approach was to consider rotations of the sphere and symme- 
tries of other, higher-dimensional objects. This led to nonabelian 
gauge theory—“nonabelian”’ because the group law is no longer 
commutative. For example, the order of operations matters when 
you rotate a three-dimensional object around two different axes 
(see Figure 6). 

The theorists were on the right track. “In the 50s it was just a 
mathematical game,” says Witten, “but by the late 60s and 70s, it 
had turned out that physics was based on nonabelian gauge theo- 
ries.”’ 

At the heart of the theory are equations known as the Yang—Mills 
equations, first put forward by Chen-Ning Yang and Robert L. Mills 
in 1954. Like Maxwell’s equations a century earlier, the 
Yang—Mills equations describe a certain field, but this time the field 
is associated with a more complicated, noncommutative group. 
Roughly speaking, Yang—Mills theory is a nonabelian form of elec- 
trodynamics that makes possible the description of particle interac- 
tions. 

Just as abelian gauge theory has both classical and quantum ver- 
sions, the Yang—Mills equations can be treated either classically or 
in light of quantum mechanics. Researchers saw deep mathemat- 
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Figure 6. Rotations in three dimensions are not commutative. 
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ics in both directions, but for applications to particle physics, physi- 
cists had to take the quantum road. 

The road soon changed from pavement to gravel, and then to an 
overgrown trail, and finally even the trail gave out. Researchers 
found themselves hacking away at a dense thicket of theoretical dif- 
ficulties with dull computational machetes. The quantum 
Yang—Mills theory worked fine for the so-called “weak”’ interac- 
tions, which include radioactivity and properties of neutrinos, but 
researchers found the going tougher in the denser, data-rich context 
of strong interactions, which hold nuclear material together. While 
enough can be computed with established methods to see that the 
theory is correct, notes Witten, “many of the most basic things are 
out of reach.”’ 

One such intractable problem is quark confinement. Proposed 
in the 1960s as a group-theoretic-based explanation of particles 
such as protons and neutrons, quarks are now generally accepted as 
real. But individual quarks are never seen—in nature, they only 
come in combinations (three make a proton, two make a pion, and 
so on). The absence of free quarks is called quark confinement. It 
was a hot topic in theoretical physics in the 1970s. “Both as a stu- 
dent and for a number of years afterwards, my main interest was to 
explain quark confinement,’ Witten recalls. But neither he nor any- 
one else made much progress. 

“By the late 70s, there were computer experiments demonstrat- 
ing quark confinement, as well as real experiments that seemed to 
demonstrate it,’ Witten says. “There were conjectures about what 
a theoretical explanation would look like,” he adds, “but no one 
was ever able in any actual model to demonstrate those character- 
istics theoretically. So by the late 70s, most of us moved on and 
worked on other things.” 

Meanwhile, the mathematicians had gotten busy on the classical 
side of the Yang—Mills equations, investigating a special case 
known as the self-dual Yang—Mills equations. Taking ordinary 
four-dimensional space as the underlying manifold, they found 
solutions which, mimicking the nomenclature of physics, they 
called instantons. The study of instantons, developed by mathe- 
maticians such as Michael Atiyah at Cambridge University and 
Karen Uhlenbeck at the University of Texas, did help solve certain 
problems in physics, but it did little for the central problems, where 
the quantum version of the Yang—Mills equations is inescapable. 

It was unclear what promise instantons held for mathematics. 


At the heart of the 
theory are equations 
known as the 
Yang-Mills equations. 
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as complicated as it 
sounds. 
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“But then there was a very major breakthrough,” says Witten. In 
1982, Simon Donaldson, who was then a graduate student at 
Oxford University, applied instantons and the self-dual Yang—Mills 
equations to general four-dimensional manifolds, and derived a raft 
of new results in geometry. 

Donaldson’s instantons for the nonabelian group of rotations of 
the sphere form a high-dimensional space that mathematicians call 
a “moduli space.”’ Each four-dimensional manifold produces such 
a moduli space of solutions. Roughly speaking, subtle information 
about the four-dimensional manifold is coded in the coarse struc- 
ture of this moduli space. The theory is every bit as complicated as 
it sounds. 

One result to emerge from Donaldson’s theory was a method for 
generating invariants for four-dimensional manifolds. The method, 
in effect, is a machine that cranks out an infinite list of numbers 
whenever a four-dimensional manifold is fed into it. Thus given 
any two manifolds, you can try to prove they’re different by feed- 
ing them simultaneously into the machine and waiting for one pair 
of numbers to be different. 

The only problem is, the crank gets harder and harder to turn. 
Mathematicians, including Taubes at Harvard, Peter Kronheimer at 
Oxford, Tomas Mrowka at Caltech, Ronald Fintushel at Michigan 
State University, and Ronald Stern at the University of California at 
Irvine, began examining Donaldson’s method more closely, look- 
ing for structure within the sequence of invariants it produces. For 
example, if each number in the list turned out to be the sum of the 
previous two numbers, then it would be enough to crank out just the 
first two numbers for each manifold. No pattern that simple 
emerged, of course, but Kronheimer and Mrowka did find algebra- 
ic relations among the invariants, so that, for any given manifold, 
it’s enough to compute a finite part of the list. (Related work of 
Kronheimer and Mrowka that resolved a longstanding conjecture in 
knot theory was featured in What’s Happening in the Mathematical 
Sciences, Volume 2.) 

The structure that Kronheimer and Mrowka found in 
Donaldson’s theory made the mathematicians wonder what made 
the crank so hard to turn. “One was led to ask whether there was 
some easier way to come about these invariants than what we were 
doing, simply because the ultimate structure was so simple,’ recalls 
Taubes. 

Enter Seiberg and Witten. 


“Donaldson astounded everybody in the way he introduced the 
nonabelian theory into the geometric problems,” Witten recalls. To 
physicists, there were analogies between Donaldson’s theory and 
their own approach to quantum field theory, “but a lot of it looked 
pretty bizarre from a physicist’s viewpoint,” Witten says. It didn’t 
help that the theory bristled with complications. Consequently, 
while mathematicians busied themselves with Donaldson’s theory, 
physicists by and large ignored it. 

Math and physics could have kept going their separate ways. 
However, Atiyah argued that Donaldson’s theory really did have 
significance for quantum field theory—and vice versa. His argu- 
~ ment finally prevailed. 

“After some initial skepticism, I took the suggestion seriously,” 
Witten recalls. In 1988, he wrote a paper in which he interpreted 
Donaldson’s theory as a quantum Yang-Mills theory. “It made 
Donaldson’s theory more intelligible for physicists, but it wasn’t 
useful at the time, and I don’t think many mathematicians paid 
much attention to it.” 

The interpretation didn’t seem useful, Witten says, because it 
only led back to hard problems in physics, without offering new 
insight on how to solve them. “We just landed in our old bugaboo, 
the problems physicists had failed to solve in the 70s,” he notes. 

But then, in 1993, Seiberg renewed the attack on some of these 
intractable problems, deciding to focus on a portion of the theory 
called the supersymmetric case. Supersymmetry is a “big unre- 


’ 


solved chapter in theoretical physics,” says Witten. In brief, it 
posits an overarching symmetry between the two main classes of 
elementary particles: fermions (which include quarks and elec- 
trons) and bosons (such as photons and gluons—the stuff that inter- 
actions are made of). The theory emerged in the 1970s. 
“Developing its properties has been one of the main themes in the- 
oretical physics ever since,” Witten says. “Many people believe 
that the next really major experimental discovery in accelerators 
may well be the proof that supersymmetry is true.” 

Seiberg “became convinced that if you restrict yourself to the 
supersymmetric case, you might be able to solve some of the tradi- 
tional unsolved problems about quantum gauge theories,” Witten 
recalls. “He also developed a point of view about how you should 
approach such problems, and a lot of the technical tools.’’ Then the 
two physicists began collaborating on the case related to 
Donaldson’s theory. 
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—Edward Witten 
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“It’s an outrageously 
amazing conjecture— 
and for all intents and 
purposes, it’s true.” 


—Cliff Taubes 
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“I must say, it was one of the most surprising experiences of my 
life,” says Witten. “I didn’t think we were going to be able to solve 
that problem.” But solve it they did. What they found, in essence, 
was a new pair of equations to replace the ones that come from 
Donaldson’s theory. The new equations seem to give as much 
information as before, but with much less trouble. 

One result that flowed from the new equations was the first 
paper-and-pencil proof of quark confinement for a relativistic the- 
ory (not the full-fledged model, Witten notes, but one that’s simi- 
lar). The calculations have not only borne out some of the conjec- 
tures physicists made in the 1970s, but also revealed hitherto unan- 
ticipated physical phenomena. “Seiberg’s revival of these tradi- 
tionally intractable problems in the supersymmetric case, and 
everything that’s happened since then—it’s reinvigorated the study 
of what’s called the dynamics of the quantum gauge theories, in a 
way that has been lacking for 10 or 15 years,’ notes Witten. “It’s 
been quite fun to see some of these things getting solved in the last 
couple of years.” 

But back to mathematics. “I’m very far from being an expert on 
four-manifolds,’’ Witten says. “My interest is really in the quantum 
field theories.”’ Nevertheless, he felt the new equations could shed 
light back on Donaldson’s theory, so he worked out the precise 
implications for four-manifolds of his work with Seiberg. Witten 
conjectured that the new equations produce invariants identical to 
those from Donaldson’s theory. He happened to mention this in a 
seminar talk at MIT in October, 1994; by chance, Cliff Taubes was 
in the audience. 

“Tt’s an outrageously amazing conjecture—and for all intents 
and purposes, it’s true,” says Taubes. “If these aren’t exactly the 
Donaldson invariants, they’re just as good, in fact better, because 
they’re a lot easier to use.” 

It wasn’t immediately clear to the mathematicians that Witten 
was right. In fact, because the idea promised to make Donaldson’s 
theory so simple, Taubes thought there was a good chance that 
Witten’s fabled intuition had finally failed him. On a trip to 
California, he passed the news of Witten’s conjecture on to Stern 
and Mrowka. A few days later, Mrowka called back: Witten’s 
equations were doing everything the mathematicians could have 
dreamed of—and more. 

With the new equations, the mathematicians redid in a few 
weeks what had taken more than a decade with the old methods, 


and also knocked off many of the theory’s unsolved problems. “We 
were quite embarrassed by the things we were proving, because 
they were literally observations,’ Taubes says. “We felt like we 
were taking candy from a baby.” 

What makes the new theory so much simpler than the old? The 
answer, in short, is compactness: The moduli space in Donaldson’s 
theory is inherently noncompact, which in essence left theorists 
chasing instantons off to infinity, whereas the Seiberg—Witten 


approach bypasses instantons altogether and gives rise to a compact 


space. As a result “the analytical complications disappear,” notes 
Uhlenbeck, adding that the technical analysis required by the new 
theory “is what I learned as a graduate student in the 60s—we just 
didn’t have the right questions!” 

Mathematicians’ earlier struggles had not been a waste of time, 
though, Taubes insists. “We learned an awful lot about differential 
equations,” he notes. More important, “we knew the questions to 
ask and the approaches to take’’ when the new method came along. 
Uhlenbeck agrees: “In the process of developing [Donaldson’s] 
theory, the topologists came to a completely different understand- 
ing of four-manifolds, so when they got the new technical stuff to 
do it, they could get tremendous results right away.” 

At this writing, Witten’s main conjecture—that the invariants his 
equations produce are identical with those of Donaldson’s theory— 
remains open. So far, the evidence looks promising: All the invari- 
ants computed using Donaldson’s theory have been recomputed 
with the new equations, and the results agree. In the long run, it 
might not make much difference. “It’s not clear anybody’s going 
to ever think about the old equations, at least not for a long time,” 
Taubes says. “Which isn’t to say the old stuff wasn’t elegant. It 
was spectacular. But the new stuff is just as spectacular, and a huge 
amount easier.” 

There is also the tantalizing possibility that Seiberg and Witten’s 
approach will reveal entirely new mathematics—some of it, per- 
haps, even harder than the old theory. Witten, for one, expects “a 
lot of new surprises in geometry”’ coming from quantum field the- 
ory. He sees the quantum version of nonabelian gauge theory as a 
challenge for mathematicians in the twenty-first century. Right 
now, he says, the physicists are ahead: “Generally speaking, math- 
ematicians greatly underestimate how many secrets about geome- 
try have their natural home in physics.” 

But who knows who’|l have the last word in this family debate? 
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Computer Science Discovers 


DNA 


Ww: giving a talk last spring at MIT, Leonard 
Adleman pulled a supercomputer out of his pocket. 


The TT-100, as he calls his machine, has done only one sim- 
ple calculation, but Adleman and others believe that the 
concept behind it could “evolutionize” the way computing 
is done. 

TT-100—the name stands for Test Tube-100 micro- 
liters—is a stubby, pencil-sized plastic vial containing a 
solution of micro-micro-processors more commonly known 
as DNA. In a research report in the journal Science, 
Adleman, a computer scientist at the University of Southern 
California, demonstrated the feasibility of using DNA to 
solve a mathematical problem. 

The idea has many of Adleman’s colleagues enthusiastic. 
“T think we’re going to find basic insights that are going to 
be very exciting for everyone,” says Richard Lipton, a com- 
puter scientist at Princeton University. “We’re going to 
learn more about computing, and we’re going to learn more 
about DNA. That has to be good.” 

Should it come to fruition on a practical level—still a 
mighty big if—DNA computing would radically change the 
nature of computer “hardware.” For the last five decades, 
computation has been all but synonymous with electronics. 
But Adleman’s DNA computer relies on biochemistry 
instead of semiconductor physics to carry out the logical 
operations that are the basis of computing. In effect, it uses 
each strand of DNA as a separate processor, with informa- 
tion encoded in the sequence of bases that characterize the 
molecule. Computations are carried out by means of 
exquisite procedures that biochemists have developed since 
Watson and Crick’s discovery of the double helix in 1953. 

In the demonstration described in Science, Adleman 
used DNA to solve a simple example of a combinatorial 
problem known as the Hamiltonian Path Problem. The 
problem can be posed in terms of a system of one-way 
roads connecting a set of cities. Having specified one city 


Leonard Adleman. (Photo courtesy of Leonard 
Adleman. ) 


Richard Lipton. (Photo courtesy of Barry 


Cipra. ) 
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The nice thing about 
biotechnology is that 
DNA is plentiful and 
(per unit, at least) 
cheap to produce. 
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to start in and another to stop at, is it possible to travel from one to 
the other visiting every city exactly once along the way? 

When the number of cities is large, the Hamiltonian Path 
Problem can be extremely difficult. It is one of the many so-called 
“NP-complete” problems, for which computer scientists conjecture 
that no efficient algorithm exists (see “A Complexity Primer” in 
What’s Happening in the Mathematical Sciences, Volume 1). 
Adleman’s example, however, has only seven cities (see Figure 1). 
Solving it is no big deal for a human—it takes only a moment to 
“see” the solution. But DNA has no visual system; getting it to find 
the answer requires a completely different way of looking at the 
problem. 

Adleman “programmed”’ his DNA computer by preparing sev- 
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Figure 1. Adleman’s groundbreaking graph. It took nearly a week for 
DNA to find the unique Hamiltonian path from city “O” to city “6” along 
the one-way “roads” of this graph. (Figure courtesy of Science, volume 
266, 11 November 1994, page 1022.) 


eral batches of oligonucleotides, the biotech term for short stretch- 
es of single-strand DNA. The first batch, consisting of copies of 
seven different strands, each 20 bases long, represented the cities. 
The second batch represented the one-way roads; these strands too 
were 20 bases long. The first ten bases of each road strand were 
complementary to the last ten bases of the strand corresponding to 
the road’s starting city, and the last ten bases of the road strand were 
complementary to the first ten of the road’s terminal city (see 
Figure 2). 


The nice thing about biotechnology is that DNA is 
plentiful and (per unit, at least) cheap to produce: $40 
these days will buy you roughly a hundred thousand 
trillion (10'") copies of an oligonucleotide. Adleman 
used only a “pinch” of each, but even that’s a huge 
number of strands, which meant he could safely 
expect this DNA haystack to produce its own needle. 

The first step of the computation was easy: 
Adleman simply dumped all the single-stranded DNA 
together. The complementary sections of the city and 
road oligonucleotides promptly began binding togeth- 
er, matching A’s with T’s and C’s with G’s, and creat- 
ing longer sequences of double-stranded DNA. Each 
such sequence represented a possible route from city 
to city. The question was whether one of them was a 
Hamiltonian path from city “0” to city “6.” 

To find the answer, Adleman first “amplified’’ all 
the sequences that started and stopped with the desired 
segments, using the polymerase chain reaction (PCR), 
which is best known as the basis for DNA fingerprint- 
ing. Then he ran the resulting product through an 
agarose gel, which separates macromolecules accord- 
ing to their size, and kept only the portion correspond- 
ing to 140 base pairs, since only those stretches of 
DNA corresponded to paths connecting 7 cities. This 
DNA he amplified and gel-purified several more 
times. Adleman then separated the double-stranded 
DNA into single strands and “incubated”’ it with each 
city’s oligonucleotide. He did this one city at a time, 
washing away any DNA that didn’t include the city’s 
strand of DNA. 

Finally, Adleman identified the Hamiltonian path 
itself (there’s only one), doing six separate amplifying 
reactions. The fourth reaction, for example, consisted 
of amplifying the stretch of the 140-base-pair “path”’ 
DNA starting at city “0” and stopping at city “4.” 
The length of each substrand (as measured on an 
agarose gel) told Adleman in what order the cities 
were visited. 

The entire computation took about a week—not 
exactly a record in the annals of computation. But 
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Figure 2. Oligonucleotides 
for two “roads” (left) line up 
with a complementary “city” 
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Adleman was sure he 
had broken ground in 
an exciting new field. 
What he didn’t expect 
was how quickly the 
idea of molecular com- 
puting would catch on. 
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Adleman had achieved his goal: to demonstrate that biotechnolo- 
gy can be used to solve an “abstract” problem that has no connec- 
tion with biochemistry. (Alexander Graham Bell didn’t exactly 
ring up his assistant from across town, either.) 

The edge DNA may enjoy over ordinary electronic computing 
lies in sheer numbers: Hardware engineers still think of billions as 
pretty big numbers, but biochemists routinely deal with quantities 
a hundred trillion times larger, such as Avogadro’s number, which 
is slightly more than 6 x 107°. Once harnessed, Adleman argues, 
even a micromole of DNA is capable of parallel processing at an 
unparallelled level. 

Parallel processing is quite different from traditional computing, 
which proceeds sequentially. Even when you have several “win- 
dows” open at once, your home computer still does only one cal- 
culation at a time. In parallel processing, by contrast, the comput- 
er does many computations simultaneously. It does so by splitting 
the task at hand among many independent processors. The basic 
idea is that a thousand cheap computers can do a job faster than one 
super-deluxe model, even if each cheap machine is a hundred times 
slower than than the expensive one. 

DNA computing takes that concept “to an almost absurd 
extreme,” says Adleman: Each “processor” may be a trillion times 
slower than its high-tech electronic counterpart, requiring anything 
from minutes to hours for a single “calculation.” But when multi- 
plied by, say, 107° (which is generally thought to be near the limit 
of what you can use in a realistic molecular computer), DNA comes 
out ahead by a factor of a hundred million. According to Adleman, 
DNA computation might also be a billion times more energy effi- 
cient than traditional computers and store information in about one 
trillionth the space required by media such as video tape. 

Adleman was sure he had broken ground in an exciting new 
field. What he didn’t expect was how quickly the idea of molecu- 
lar computing would catch on. 

Lipton was the first aboard. He and Adleman met at a confer- 
ence in Sante Fe soon after Adleman’s paper appeared in Science. 
A few weeks later, Adleman heard by e-mail that Lipton had writ- 
ten a note on the subject. “I was pleasantly surprised,’ Adleman 
says. “Lipton had simplified the whole point of view, abstracted 
out what was important, and demonstrated that molecular comput- 
ing might be applicable to a wide variety of problems, a much 
wider variety than at first one might have thought.” 
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Lipton observed that DNA computing boils down to four basic 
procedures. One is simply to detect whether a test tube contains 
any DNA at all. A more complicated procedure isolates all strands 
of DNA with a particular substring of bases, much as a targeted 
mail-marketing program might sift a database for addresses with a 
given ZIP code and income level. A third, very simple, procedure 
consists of pouring two test tubes together. The fourth procedure is 
to amplify a quantity of DNA, say doubling it, perhaps repeatedly. 

Instead of using DNA to search for an unknown Hamiltonian 
path, Lipton’s theory relies on the abundance of DNA to produce 
all paths in a particular, highly structured graph (see Figure 3). 
These paths are used, in turn, to represent m-bit numbers. For 
example, in Figure 3 there are eight paths from a; to a4. The path 
a, X',a2%2a32%3a4 can be interpreted as the binary number 011, 
121092032344 represents 101, and so forth. 

With DNA, says Lipton, it’s possible to think in terms of prepar- 
ing bulk quantities of, say all 70-bit numbers. This makes it con- 
ceivable to attack a notoriously difficult computer science problem: 
determining whether a given logical expression with several dozen 
“Boolean” variables—variables that take only two values 
(True/False, or, in computerese, 1/0)—is ever “satisfied,” that is, 
whether there is any assignment of values to the variables that 
makes the expression true. . 

Simple logical expressions, such as “(A or B) and (not si or not 
B)” (or, symbolically, (A VB) \ (=A V =B)), are easy to check 


Figure 3. A graph whose paths represent binary numbers. 
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as satisfiable or not, since it’s easy to try all possible truth assign- 
ments (in the example, letting A be true and B false is one satisfy- 
ing assignment). But for complicated expressions, with lots of vari- 
ables, checking satisfiability is not so easy. It is, in fact, the origi- 
nal NP-complete problem. 

Nonetheless, Lipton showed that quite large satisfiability prob- 
lems could be solved by a DNA computer, and that the number of 
steps required is roughly proportional to the length of the logical 
expression. In essence, a series of “merge” and “‘separate’’ proce- 
dures isolate all DNA strands that correspond to satisfying assign- 
ments of a given Boolean expression, and a final “detect” proce- 
dure tells whether anything is left. 

Satisfiability problems don’t come up often in practice—per- 
haps, notes Adleman, because there hasn’t been a practical way of 
solving them. But many computer algorithms include search rou- 
tines that can, at least theoretically, be speeded up by exploiting the 
extreme parallelism of DNA. The speedup factor could be as much 
as a trillion. 

One possible application is in the area of cryptography. Lipton’s 
graduate students Dan Boneh and Christopher Dunworth have plot- 
ted a biochemical attack on a system known as the Data Encryption 
Standard (DES). DES was invented by the National Security 
Agency for government use. Its design resembles that of a standard 
combination lock: The basic right-left-right procedure is the same 
for each lock, but each user has a different, secret combination. For 
DES, the combination, or “key,” is a single 56-bit string of 1’s and 
0’s, which means there are roughly 10*’ different possibilities. 
That’s enough to defeat silicon-based computers for the foreseeable 
future, but well within the reach of a small vat of DNA. 

Boneh and Dunworth worked out the sequence of biochemical 
procedures that corresponds to running the DES algorithm. They 
found it’s possible to crack the code in fewer than a thousand 
steps—about four months’ work with current laboratory tech- 
niques. 

The analysis of DES is more of a thought experiment than any 
real threat to cryptography, Lipton notes. For one thing, Boneh and 
Dunworth’s brute-force DNA attack can easily be turned back by 
doubling the number of bits in the DES key: Even chemists think 
10°*is a big number. 

It’s also not clear how easy it will be to scale up DNA comput- 
ing technology to compete with current electronic computing. 


Reliability is a key concern: Biochemists live with error rates that 
would force a modern computer manufacturer into bankruptcy. 
But, notes Lipton, electronic computing had its own troubles with 
reliability back in the days when machines ran on vacuum tubes. 
Biochemistry might well experience a technological revolution as 
dramatic as the invention of the transistor. 

Indeed, says Lipton, DNA computing will probably spur 
advances in biotechnology even if DNA computing itself never 
pans out. On the one hand, computation’s need for large-scale, reli- 
able systems will push molecular biologists to develop better and 
better methods; “We’re stressing [molecular biologists] in ways 
that they haven’t traditionally been stressed,” says Lipton. On the 
other hand, as computer scientists learn more about the ins and outs 
of molecular biology, they should be able to translate insights from 
computer science into useful knowledge for biologists. “If you 
look broadly enough at what molecular biology does when it 
manipulates DNA, it does a funny kind of computation,” Lipton 
explains. “If you’re sequencing DNA, you’re performing experi- 
ments and making inferences, and in some generalized sense that’s 
computing. Maybe we as computer scientists can help organize 
that in a different way that might be more efficient or more error 
tolerant.’’ In a similar way, computer scientists have already helped 
solid-state physicists get the most out of their integrated circuits. 
“We’re trying to make this more of a two-way street,” says Lipton. 

The final, perhaps determining question for DNA computing, 
say Adleman and Lipton, is whether computer scientists can iden- 
tify any scientifically (or economically) interesting problems that 
can be solved more efficiently by taking advantage of DNA’s 
“absurd”’ parallelism than by the plodding gallop of electrons flip- 
ping one switch after another. If they build it, will anyone come? 

“Tt don’t think that this very parallel regime will solve every 
problem that we traditionally put on a supercomputer—that’s prob- 
ably not in the cards,” says Lipton. “More likely, some very spe- 
cialized collection of them, like these cryptography problems, are 
going to be mapped and programmed in this way. In other words, 
people are going to realize there’s a way to solve them that’s incred- 
ibly parallel. And if that’s done for enough interesting problems, 
then there will actually be some economic pressure to build DNA 
computers.” 

Adleman has even started thinking beyond DNA. He sees 
promise in tailoring computers to fit a whole continuum of 
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Tamar Schlick. (Photo courtesy of Tamar 
Schlick.) 
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purposes. “In practice, there’s no such thing as a general-pur- 
pose computer,” he says. “All computers are special-purpose,” 
working swiftly on some problems and poking along on others. 
A superfast but essentially sequential electronic machine and 
Adleman’s leisurely but ultra-parallel DNA computer lie at 
opposite extremes. One might wonder, notes Adleman, whether 
some practical problems are best handled by something in 
between, say a computer with ten billion processors, each run- 
ning a few hundred operations per second and “communicat- 
ing’’ via connections with, say a thousand of its neighbors. 

In one sense the anwer is obvious; such computers already 
exist. If you think about it, you’re using yours right now. 


on are collaboratin : 


one ‘interact | om 


If DNA actually stayed 
as straight as the text- 
books show, it wouldn’t 
even fit inside a cell, let 
alone interact with 
proteins. 


Figure 4. The double helix consists of complementary base pairs (A and 
T, C and G) attached to a “ribbon” of sugar phosphate. 
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Figure 5. “Snapshots” from a simulation of DNA supercoiling. (Figure courtesy of Tamar Schlick and Wilma Olson, 
from J. Molecular Biol. (1992), vol. 223, pages 1089-1119, Academic Press.) 
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Figure 1. The Intel Pentium® processor. (Photo courtesy of Intel Corporation. ) 
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Divide and Conquer 


homas Nicely didn’t set out to discover a bug in the brand new 
Pentium processor. It just worked out that way. 

Nicely, a professor of mathematics at Lynchburg College 
in Lynchburg, Virginia, is the man responsible for calling public 
attention to a design problem in the Pentium chip. The result 
was a public relations nightmare for Intel, the Pentium’s manufac- 
turer. 

Introduced in 1993, the Pentium chip was touted as bringing a 
new level of speed and power to personal computing. By fall, 
1994, Intel had sold nearly a million Pentiums, powering IBMs, 
Packard Bells, and other computers. At around $1000 a pop, the 
Pentium promised Intel a tidy profit. Then Nicely dropped his 
bombshell: The Pentium chip had trouble with arithmetic. When 
asked to divide one number by another, it sometimes gave the 
wrong answer. 

In a sense, all computers do division wrong. When a comput- 
er’s floating point unit, which handles numbers in decimal form, 
divides one number by another, it almost always makes a small 
error. The same thing happens with multiplication, and even with 
addition and subtraction. That’s because computer arithmetic is 
usually done not with exact numbers, but with approximations hav- 
ing a fixed number of decimal places; every operation is rounded to 
give a result with the same fixed number of decimal places. For 
example, 1/3 = 0.33333... is stored not as an infinite sequence of 
3’s, but as some finite string. 

These round-off errors are predictable, and computer scientists 
have developed rules for when to round up and when to round 
down. Floating point units typically give results that are valid to 
between 10 and 20 decimal places; many offer “extended preci- 
sion” capabilities that guarantee more digits accuracy. Round-off 
errors seldom matter in routine calculations, but in scientific com- 
putations, which may involve billions of operations, the accumula- 
tion of error is a serious concern. If, for example, ten billion num- 
bers, each “good” to ten decimal places, are added together, the 
computer’s answer may be meaningless in every digit. In practice, 
there is usually about as much rounding up as down, and a statisti- 
cal principle known as the random-walk phenomenon says that the 
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result is probably good to five decimal places—which may or 
may not be satisfactory. 

The Pentium’s problem was that while it guaranteed 19 digits 
accuracy for each floating point operation, it sometimes deliv- 
ered far fewer. The chip, as first manufactured, was missing part 
of its division algorithm. The absent component belonged to a 
“look-up” table that the chip uses for division—something like 
not knowing how many 6’s are in 43. The Pentium’s particular 
blind spot for division rarely displays itself; that’s why the error 
slipped past Intel’s quality control tests. When it occurs, howev- 
er, the error can appear in the fifth decimal place—a performance 
a hundred billion times worse than the chip purports to deliver. 
Intel discovered the error for itself in the summer of 1994, and 
moved quickly to correct it. The company decided, however, not 
to bother owners of the defective chip with a technical advisory, 
figuring the error arose so seldom that nobody 
would be affected. 

That was Intel’s second mistake. 

Hardware bugs are nothing new. Every 
machine since the abacus does quirky things 
now and then, as anyone whose PC just crashed 
can attest. But the Pentium bug was especially 
embarrassing, because it involved an operation 
most people master in grade school. Still, an 
error that crops up once in a billion times— 
who would ever notice? Or so the thinking 
went. 

Enter Nicely, and a problem known as the 
Twin Prime Conjecture. Prime numbers have 
long fascinated mathematicians. In Book IX of 
the Elements, Euclid proved that there are infi- 
nitely many primes. His argument is simple, 
elegant, and compelling: Given any finite list 
of primes, their product plus 1 is a number not 
divisible by any of the primes in that list. But 
every number has at least one prime divisor, so 
no finite list can possibly include every prime. 
In 1896, Charles-Jean de la Vallée Poussin and 
Jacques Hadamard (independently) proved the 
Prime Number Theorem, which states that 


Thomas Nicely. (Photo courtesy of Lynchburg College, there are approximately «/Inz primes less 
Lynchburg, Virginia.) 
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than any given (real) number x, where In z is the natural logarithm 
of x. Thus, for example, there are approximately 100/1n 100 = 22 
primes less than 100, which is not too far from the exact value of 
25 

Armed with the prime number theorem, mathematicians have 
solved many mysteries about prime numbers, but many others 
remain. One especially vexing problem concerns “twin primes’’— 
pairs of consecutive odd primes, such as 3 and 5, 5 and 7, 11 and 
13, etc. The question is this: Are there infinitely many such pairs, 
or does the property die out somewhere down the (infinite) list of 
primes? 

Heuristic reasoning has led number theorists to conjecture that 
an analog of the prime number theorem holds for twin primes: The 
number of such pairs up to x should be roughly proportional to 
z/(Inzx)*. In 1919, the Norwegian mathematician Viggo Brun 
proved that, for large values of x, the number of twin primes is less 
than 100z/(Inzx)*. (That upper bound has more recently been 
brought down to around 6z/(Inz)?.) But no lower 
bound has been found that would prove the infinitude of twin 
primes. 

Brun also proved a curious theoretical result: If you reciprocate 
all the twin primes and add them together, forming the sum 


the sum is finite—the infinite series converges to some number B, 
which number theorists call Brun’s sum. 

Estimating Brun’s sum numerically, however, is a pain, because 
twin primes occur so unpredictably. Any finite portion of Brun’s 
sum gives an underestimate, just as any finite portion of the sum 
bok : os 7 . = + ~ +--+ gives a value a bit smaller than 2. With 
Brun’s sum, there’s no way of knowing how close your estimate 
comes. You may have stopped just short of a long prime-rich 
stretch of numbers, which would make your estimate way too 
small. 

Thinking heuristically, number theorists conjectured that adding 
a “correction” term proportional to 1/1n p onto the part of Brun’s 
sum that stops at ; af a will give an estimate with error less than 
some multiple of 1/(,/plnp). In 1974, mathematicians Daniel 
Shanks and John Wrench, Jr., in the computation and mathematics 
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department of the Naval Ship Research and Development Center in 
Bethesda, Maryland, tested this conjecture by looking at all the 
twin primes among the first two million prime numbers (see 
Figure 2). The next year, Richard Brent at the Australian National 
University tabulated all twin primes up to a hundred billion—there 
are 224,376,048 pairs—and obtained an estimate of 1.90216054 for 
Brun’s sum (see Figure 3, page 44). 

After that, most number theorists stopped worrying much about 


THOUSANDS OF PRIMES 


Figure 2. Estimates of Brun’s sum based on the occurrence of twin primes 
among the first million prime numbers (left) and the second million prime 


Brun’s sum—they had plenty of other interesting problems to keep 
their computers busy. One was the unexpected development of 
cryptographic systems based on the apparent difficulty of factoring 
large numbers (see “The Secret Life of Large Numbers,” pages 


90-99). But in 1993, Thomas Nicely decided to have another go at 
Brun’s sum. 

Nicely had been looking for a problem that could be attacked 
with desktop computers. “I felt if I could come up with such a 
problem, it might serve as a model for other people working in 
small college environments, who don’t have access to supercom- 
puters,” he says. Nicely eventually settled on studying the twin 
primes. He hoped to push Brent’s computation up into the trillions. 
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numbers (right). (From “Brun’s constant,” Daniel Shanks and John W. 
Wrench, Jr., Mathematics of Computation 28, no. 125 (1974), page 295.) 


Conceptually, there’s no difficulty. The first step is to list all 
primes into the trillions. Nowadays this doesn’t take very long. 
(storing the results is problematic, however—Nicely processes 
numbers in batches of a billion). The second step is to pick out all 
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twin primes. Finally, one reciprocates all the survivors and adds 
them up, tacking on the conjectured correction term at the end. 
What appealed to Nicely was that the computation could be split up 
among several machines, each searching for primes in separate 
stretches. 

Nicely started in 1993 with five computers, all running then- 
state-cf-the-art “486” chips. In March, 1994, he added a Pentium 
machine. 

The hardest problem, Nicely knew, would be deciding whether 
the computations were trustworthy. Accumulating round-off error 
was one problem: Expecting upwards of 100 billion pairs of twin 
primes, Nicely knew he would need to compute reciprocals to a 
very high accuracy to improve on Brent’s estimate. He also had to 
worry that some error in his program—or in the operating system 

of the computers he planned to use— 


n T,(n) B*(n) might make nonsense of the computation. 
To be sure he didn’t inadvertently skip 
10° 35 1.90030531 any primes, Nicely checked his tallies of 
ie 205 pi a primes against previously published 
- St results. To be doubly sure, he decided to 
106 8169 1.90191335 dies ; 
7 compute Brun’s sum in two different 
10 58980 1.90218826 p 
108 440312 1.90216794 ways: first using each computer’s float- 
9 : 
10 3424506 1.90216024 ing point unit to calculate each reciprocal 
2x 10° 6388041 1.90215957 
3x 10° 9210144 1.90215977 to 19 digits, and then using an ultra-preci- 
4~x 10° 11944438 1.90215950 sion algorithm that had been developed 
5 x “i 14618166 1.90215984 by Arjen Lenstra at Bell Communications 
: ie wes ibe. Research (Bellcore) in Morristown, New 
8 x 109 22384176 190216011 Jersey. Nicely set the ultra-precision code 
9 x 10? 24911210 1.90216037 to work with 26 digits accuracy. 
10! 27412679 1.90216036 By June, he had results—and an obvi- 
2= 10" 51509099 1.90216076 a 
3 x 1010 74555618 190216064 ous error. “The computed check value for 
4x 1010 96956707 1.90216031 a(x) [the number of primes up to x] dis- 
5 x . 118903682 1.90216031 agreed with the published value,” Nicely 
I 
wees 140494397 190216033 recalls. He also observed that the round- 
7x10" 161795029 1.90216032 
8 x 1910 182855913 1.90216040 off error in his ultra-precision calculations 
9x 10!9 203710414 1.90216053 was accumulating much faster than 
11 . 
10 224376048 1.90216054 


expected. After a long search for mis- 
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takes in his program, he finally found a 


Figure 3. Twin primes and Brun’s sum up to 100 billion, studied by subde-ciropin the Borland G++ 4.02:com- 


Richard Brent. (From “Irregularities in the distribution of primes and 
twin primes,” Richard P. Brent, Mathematics of Computation 29, no. 
129 (1975), page 45.) 


piler that translated things into instruc- 
tions for the computer. 
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“For some time I believed this to be the source of my woes,” he 
says. To reassure himself, he extended the ultra-precision algo- 
rithm to 53 digits accuracy. He also began running every computa- 
tion on at least two separate machines, so that he could compare 
results. 

It was a good thing he did. “Once I worked around the compil- 
er bug, the excess rounding error went away in the ultra-precision 
routine,” Nicely recalls. “At that point I expected the computation 
to behave correctly.”’ 

It didn’t. 

In early October, Nicely got comparison results off a 486 
machine for the twin primes up to a trillion—a computation the 
faster Pentium chip had already completed. “Five minutes later, I 
knew the error was still there, and it was worse than I had thought,” 
he says. The ultra-precision results from the two computers dif- 
fered in their last 20 decimal places. Round-off couldn’t be the cul- 
prit; one of the two machines—possibly both—was doing some- 
thing wrong. 

Fortunately, Nicely had stored intermediate results from each 
machine at the end of each billion numbers, so he could quickly 
find a range where the two machines differed. Within a few days 
he had isolated the problem: For some reason, Nicely’s Pentium 
machine had miscalculated the reciprocals of the twin primes 
824,633,702,441 and 824,633,702,443. For those two numbers, the 
Pentium, whose floating point unit promised 19 digits accuracy, 
was giving only nine. 

Using special-purpose debugging software, Nicely traced the 
error to a single instruction at the Pentium computer’s assembly 
level; this led him to believe the culprit had to be the floating point 
unit itself. However, “even at that point I was trying to think of 
something else that could cause [the error], such as a virus or a 
problem with his machine’s data bus, Nicely recalls. “But eventu- 
ally I was able to try the calculation on other Pentium systems, and 
all of them made the same error, and the only thing they had in 
common was the chip.” 

Intel’s Pentium chip couldn’t divide. 

“Tt was just stunning,” Nicely recalls. “I couldn’t believe that 
the chip would have been released with that error in it.”” But it had 
been. Moreover, Nicely got no response when he contacted Intel 
with his discovery. Finally, he posted the news on the Internet. The 
Net result was a public embarrassment for the chip’s manufacturer. 
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Intel finally acknowledged the error, but insisted that few of its 
customers would ever be affected. To get a replacement chip, the 
company first said, customers would have to demonstrate a need for 
one. The outcry was deafening, and Intel finally backed off, agree- 
ing to replace the chip on demand. 

Although its Pentium chip was the butt of many jokes, Intel may 
have the last laugh yet. The chip “‘sold like hotcakes”’ for Christmas 
1994, says company spokesman Thomas Waldrop. By the middle 
of 1995, Intel had sold tens of millions, more than enough to offset 
the $475 million write-off it took for offering replacements. The 
chip has also seen several speed upgrades; a 150 megaHertz model 
is expected by late 1995. Intel also has a new chip, the P6, waiting 
in the wings. | 

Nicely could put such a chip to use. His original goal has now 
been met. He reports finding 135,780,321,665 pairs of twin primes 
in the first 100 trillion numbers, and an estimate for Brun’s sum of 
1.9021605778 (see Figure 4). As a result of the pub- 
licity, he wound up in touch with Martin Kutrib and 


Despite all evidence to the con- Jérge Richstein at the University of Giessen in 
trary, too many people believe Germany, who had also been pursuing Brent’s sum 
that if an answer comes out of a into the trillions. They have been trading results, 


machine, it has to be right. 


which, Nicely notes, provides yet another check on 
the computation. 
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Indeed, says Nicely, “I’m really distrustful of any 
result that comes from a single machine.” He now uses at least two 
different computers for all calculations. And with good reason: 
“Five more times since last October I’ve gotten different results 
from two machines,” he notes. Two he traced to faulty memory 
chips, two to bugs in a memory manager, and one to a hard-disk 
failure. 

To Nicely, the bugs and failures are no surprise. “They’re much 
more prevalent—and important—than most people believe or want 
to believe,” he says. Despite all evidence to the contrary, too many 
people believe that if an answer comes out of a machine, it has to 
be right. As computers run more and more of the equipment that 
directly affects our lives—everything from flight control systems to 
X-ray machines—computer designers will need to compensate for 
the unavoidable imperfections of their products. “It’s something 
that needs to be taken a lot more seriously,’ says the man who took 
a math problem seriously enough to track down a once-in-a-billion 
error. 
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Figure 4. Graphical form of Nicely’s estimate for Brun’s sum in the range from 10 billion to 100 trillion. The data points 
up to 100 billion were computed by Richard Brent in 1975. 
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Figure 1. The theory of linear algebra carries over into the high-dimensional world of digital images with the compu- 


tation of “eigenfaces” in an application of control theory to an important problem in pattern recognition (see page 54). 
(Figure courtesy of the Harvard Robotics Lab.) 
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The Gentle Art of Control 


ao-tzu was not what you’d call a control freak. In his self-help 
book, Tao Te Ching (The Book of the Way), the fifth-century- 
BC Chinese philosopher points out that 


True mastery can be gained 
by letting things go their own way. 
It can’t be gained by interfering.* 


Two-and-a-half thousand years of “progress” have, in some 
eyes, rendered Lao-tzu’s outlook obsolete. We tend to think of 
technology as exerting control over nature, making it do whatever 
we want, whether it be focusing laser light on the retina of an oph- 
thalmic patient or making a hundred tons of luggage and jet fuel 
stay up in the air. But there are other ways to look at the way tech- 
nology works. In fact, one very modern branch of mathematics can 
be viewed as confirmation of the Taoist outlook on life. 

Ironically, the Tao-friendly branch of mathematics is called 
Control Theory. 

Roughly speaking, mathematical control theory is what keeps 
technology from doing itself in. More poetically, perhaps, it keeps 
technology in harmony with nature. The basic idea is to understand 
exactly how a physical system works—how it responds to various 
changes in its environment and where the inevitable errors and 
uncertainties come from—and then to plan strategies that steer the 
system along a desired course, making use of the system’s natural 
tendencies rather than fighting against them. 

Mathematics is essential from the get-go. Understanding a sys- 
tem starts with making a mathematical model—a description of the 
system in terms of numerical quantities and equations that express 
relationships among them. Given a model, the control theorist can 
explore effects of one variable on another, identify which parame- 
ters the system is most sensitive to, and predict what the system 
will do under various conditions—all without crashing any planes 
or blowing up any chemical plants. 

Mathematical models provide the basis for computer algorithms 


*This quote and the quotes from pages 51 and 53 are from, “Tao Te Ching,” A 
New English Version with Foreword and Notes, by Stephen Mitchell, 
© HarperPerennial 1991. 
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Control theorists use a 
surprisingly wide 
range of mathematics, 
from the basics of 
linear algebra and 
complex analysis to 
advanced subjects such 
as Riemannian geome- 
try and Lie algebras. 
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that will control the actual system. These algorithms take input 
from sensors and send output to actuators. Mathematics again 
plays a crucial role in making the algorithms work quickly and 

accurately. | 

Such controls are vital components in systems ranging from air- 
craft flight stabilizers to industrial robots to VCRs and computer 
disk drives. Control theory has also proved useful in finance and 
economics, and in such biological applications as immunology. 

Control problems are most often formulated as differential equa- 
tions. These equations are generally of the form dx/dt = F(x, u), 
where x represents the “‘state’” of the system and wu is a “control 
parameter” —the means by which one hopes to steer the system 
along a desired course. For example, x might be a vector repre- 
senting a car’s speed and direction, while u includes such variables 
as engine speed (which is controlled, in turn, by the force applied 
to the accelerator pedal) and steering-wheel position. 

Driving illustrates what control theorists call a closed-loop sys- 
tem. The driver constantly monitors the car’s motion and, based on 
what’s observed, adjusts the control parameter (by accelerating, 
braking or turning the steering wheel) to correct for minor devia- 
tions. By contrast, golf is mainly a matter of open-loop control: 
The golfer “reads”’ the green, putts the ball, and hopes for the best. 
(Actually, there’s a lot of closed-loop control involving eye, brain, 
and musculature going on during the putt itself.) 

Mathematical control theory takes various forms. Optimal con- 
trol, for example, seeks the solution that’s best in some technical 
sense, such as maneuvering a spacecraft with a minimum of fuel. 
In robust control, on the other hand, the goal is to design strategies 
for handling systems whose states are not entirely known or mea- 
surable. The military, for one, has a keen interest in equipment that 
works reliably in unpredictable circumstances. 

Control theorists use a surprisingly wide range of mathematics, 
from the basics of linear algebra and complex analysis to advanced 
subjects such as Riemannian geometry and Lie algebras. In robot- 
ics, for example, it’s natural to describe the position of a multi- 
jointed “arm” with an n-dimensional vector of angles (at the 
“shoulder,” “elbow,” “wrist,” “finger,” etc.) Planning a robot’s 
motions then calls for a combination of analysis, topology, and 
abstract algebra. 

Linear algebra is an essential tool in control theory, as it is 
throughout mathematics. When a system is linear, the job of figur- 
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ing out which inputs will produce a desired output is considerably 
simplified. In effect, you can predict the behavior of a linear sys- 
tem by analyzing its response to small changes in individual vari- 
ables. Engineers often intentionally design systems to be linear 
because it makes life so much easier. When that’s not possible, 
they try to identify a range of conditions under which nonlinear 
effects are small. As long as things can be kept within that range, 
the system’s behavior is approximately what a linear model pre- 
dicts. That may sound like operating on a wing and prayer, but it’s 
how control theory keeps airplanes on a stable flight path. 

Among the most powerful and widespread applications of linear 
algebra in control theory is the Kalman filter. Introduced in 1960 
by Rudolf Kalman, then at the Research Institute for Advanced 
Studies in Baltimore, and refined by Kalman and Richard Bucy at 
the Johns Hopkins Applied Physics Laboratory, the Kalman filter is 
a mathematical algorithm that makes optimal use of imprecise data 
on a linear system to continuously update a “best”’ estimate of the 
system’s current state. Early applications sprang up mainly in the 
aerospace industry—NASA and the Pentagon recognized in the 
1950s the need for highly accurate tracking and targeting systems 
for satellites and missiles—but today’s applications reach almost 
everywhere, from inertial navigators on airplanes to analyses of 
economic markets. 

“Anything that moves, if it’s automated, is a candidate for a 
Kalman filter,’ says Blaise Morton, a control theorist at Honeywell 
Inc. in Minneapolis. The only requirements are that the system be 
linear (or nearly so) and that the noise in the data have a bell-shaped 
distribution (in statistics, this is analogous to the data being linear). 

Nonlinear mathematics can’t be swept under the rug entirely, 
however. A lot of the latest control theory aims not to suppress the 
tendency to stray from the straight and narrow, but to make use of 
it. (As Lao-tzu put it, “If you want to get rid of something, you 
must first allow it to flourish.”’*) Philip Holmes at Princeton 
University and colleagues John Lumley at Cornell University, and 
Brianno Coller, now at Caltech, for example, are studying control 
of nonlinear systems with instabilities known as _heteroclinic 
orbits—trajectories that connect one point of unstable equilibrium 
to another. As a simple example, imagine trying to balance a ping- 
pong ball at the end of a popsicle stick (see Figure 2, next page). 
By counteracting any small motion of the ball with a small move- 
ment of the popsicle stick—that is, by applying control—you can 


Engineers often 
intentionally design 
systems to be linear 
because it makes life 
so much easier. 
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ball at the end of a 
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keep things steady for a fairly long time. But if the ball ever does 
manage to roll off the stick (perhaps because you sneezed), your 
best bet for recapturing it is to let it bounce off the floor and return 
close to its original position, where you can slide the popsicle stick 
back under it. 

Holmes’s group is interested in a much more complicated appli- 
cation: controlling turbulence in fluid flows in boundary layers 
over a surface such as an airplane wing. One source of drag (the 
“bad” force in aerodynamics) is the “bursting”’ of tiny vortices that 
occurs as air flows 
over the wings. By 


combining theoreti- 
cal analyses and 
computer simula- 
tions, Holmes 
and his col- 
leagues have 
shown _ that 
it’s possible 
to reduce 
the amount 
of bursting ©& 
by  continu- 
ously making 
tiny, local adyjust- 
ments to the shape of 
the wing. The results are still 
preliminary, Holmes says. For 
one thing, the theoretical work 
has been done only for a simpli- 


fied, low-dimensional model of ee 
fluid flow—the full-fledged prob- pong ball on a popsicle stick is 
lem is a lot harder. But the basic a tricky exercise in nonlinear 
idea seems to work, at least comrol. 
numerically: Use linear control 
theory to keep the system near equilibrium as long as possible, but 
when bursting does occur, rely on the underlying nonlinear dynam- 
ics to take the system to another point of equilibrium. 

Controlling turbulence by changing the shape of an airplane 


wing may sound far-fetched, but engineers have come up with a 


new technology, called micro—electrical-mechanical systems 
(MEMS), that might do exactly that. Tiny sensors and actuators in 
the form of “microflaps’’ mounted on a chip can be programmed to 
change the microflaps’ positions in response to fluid flow measure- 
ments. MEMS enthusiasts have proposed putting thousands of the 
tiny devices on airplane wings. Laboratory experiments suggest 
the idea may work. The Cornell group’s theoretical results support 
the same conclusion; more important, they provide a guide for 
MEMS designers to follow—if they pay attention. 

Unfortunately, control theory is often the last thing design engi- 
neers worry about. “There is a tendency to just assume that the 
control design is already done, off the shelf,’ says John Burns, a 
control theorist at the Virginia Polytechnic Institute and State 
University. Part of the reason, according to Burns, is that control 
engineers (the people who apply control theory to practical prob- 
lems) have been all too good at what they do, leaving the impres- 
sion that control is the easy part of the job. In fact, control engi- 
neers must often work hard to compensate for being left out of the 
design loop. For example, design engineers typically decide in 
advance how many sensors and actuators to use, and where to put 
them. Control engineers then have to do the best they can with the 
cards they’ve been dealt. 

So far, it’s worked out. “Working control engineers have been 
very good at designing good controllers with bad models,’ Burns 
says. But as systems get more complicated—-as when thousands 
of little flaps all try to coordinate their activities—control theory 
may finally balk at the demands. It would be far smarter to put the 
cart behind the horse (something you’d think people with a college 
degree would know anyway), and include control-theoretic consid- 
erations in the early stages of design. Sometimes, says Burns, the 
theory can and should outpace practice; he urges his colleagues to 
“build your best controller and see if the technology can catch up.” 

They may also want to brush up on their Lao-tzu: 


The Master allows things to happen. 
She shapes events as they come. 
She steps out of the way 

and lets the Tao speak for itself. * 


‘Build your best 
controller and see 
if the technology 
can catch up.” 


— John Burns 
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Pattern recognition is 
something brains do 
astonishingly well. But 
intelligent behavior of 
any sort has proved 
devilishly difficult to 
program. 
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Figure 3. Top: A cross-sectional image from one brain (left) is deformed by a mathematical process called brain warp- 
ing (right) to resemble the image of a similar section from another brain (middle). Bottom: The effect of warping on an 
underlying grid (left); without warping, there is considerable difference between the two images (middle); warping 
removes almost all the difference (right). (Figure courtesy of Michael Miller et al., Department of Electrical 
Engineering, Washington University. Cortical brain slice images provided by David Van Essen, Department of Anatomy 
and Neurobiology, Washington University.) 


WHAT’S HAPPENING IN THE 
MATHEMATICAL SCIENCES 


55 


Analyzing images is 
just one of the brain’s 
natural abilities the 
CICS researchers 
hope to mimic. 
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Figure 4. Images from a database of textbook MRI and PET images (left) are deformed by a mathematical process 
called brain warping (right) to resemble images from a patient (middle). “Smart” algorithms such as brain warping are 
being developed to assist in medical technology and other fields where pattern recognition plays an important role. 
(Figure courtesy of Michael Miller et al., Department of Electrical Engineering, Washington University. Image data pro- 
vided by Marcus Raichle, Department of Neurology and the Mallinckrodt Institute of Radiology, Washington University.) 
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“Singularity.” A frozen moment of fluid flow, rendered in white plastic, by sculptor Mel Fisher. (Photo courtesy of Mel 
Fisher and Will Brown, photographer. ) 
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Computational Fluid 
Dynamics—Verging 
on Turbulence 


luid flow is one of nature’s most fascinating phenomena. The 

sparkle of water dancing over rocks, the ooze of honey from a Understanding fluids 
jar, the upheaval of storm clouds and the terrifying descent of a tor- and how they move is 
nado, the elegance of wind filling a sail and the annoying drift of not just a riddle for sci- 
cigar smoke in a restaurant—these are all familiar examples of fluid entists: it’s a prac tical 
dynamics. Harder to picture but equally intriguing are the drag- 4 ‘ ° 
inducing turbulence of air streaming over and under the surface of Pp roblem of daily life. 
an airplane wing, the global flows known as weather, and the con- 
vective processes within the Earth and Sun, which drive their mag- 
netic fields and give rise to solar flares and volcanic eruptions. 

Understanding fluids and how they move is not just a riddle for 

scientists; it’s a practical problem of daily life. We breathe air and 
drink water. In trying to move things quickly, we must contend 
with the resistance of wind and water. Sometimes we need to move 
fluids themselves (such as water, oil, and natural gas) cheaply and 
efficiently. Fluid dynamics is a problem of utmost importance. 


It’s also one of the hardest problems around. 

The basic equations that describe fluid motion, the so-called 
Navier-Stokes equations, have been around for over a hundred 
years. Related equations have been derived for phenomena such as 
combustion and turbulent diffusion. None of these equations looks 
terribly complicated; they’re all systems of partial differential equa- 
tions that express such basic facts as the conservation of mass, 
energy, and momentum. But appearances can deceive. Just below 
the surface, the equations of fluid flow conceal a sea of mathemat- 
ical difficulties. 

A closer look at fluid flow suggests some of the complications. 
Just watch the cascade and crash of water in a fountain. 
Turbulence, in which a gas or liquid seems to go every which way 
at once, seems almost by definition to defy analysis. So far, indeed, 
it’s eluded even the most sophisticated theories. 

“Considering how long we’ve worked on turbulence, it may well 
prove one of the most difficult areas in science,’ says Paul 
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Figure 1. The velocity of fluid flowing smoothly through a pipe is 
greatest at the center and zero at the walls. 


Dimotakis, a professor of aeronautics and 
applied physics at Caltech. “It certainly is one 
of the twentieth-century embarrassments in 
physics.” 

Mathematically, the source of the difficulty 
is a nonlinear term in the Navier-Stokes equa- 
tions. In settings where this term stays small— 
such as when water flows slowly through an 
ordinary, cylindrical pipe—the nonlinearity is 
not a problem and the equations can be solved 
(see Figure 1). But when the nonlinearity kicks 
in, as it does when the flow speeds up, not only 
do exact, analytic techniques fail to solve the 
equations, but even numerical methods for 
obtaining approximate solutions become sus- 
pect. Even in the ostensibly simpler case of 
two-dimensional fluid flow, the nonlinear diffi- 
culties remain. The problems are severe enough 
that theorists aren’t even sure that solutions 
necessarily exist for three-dimensional flows; 
there is no mathematical proof that the motions 
described by the Navier-Stokes and many other 
equations of fluid dynamics don’t degenerate 
into discontinuous nonsense. 

Nevertheless, researchers have come a long 
way in applying computer power to solve some 
of the seemingly intractable problems posed by 
the Navier-Stokes equations and their kin. 
Aeronautical engineers, for example, have 
sharply reduced the need for expensive wind- 
tunnel testing. Combining computer-assisted 
geometric design to model surfaces mathemat- 
ically and numerical methods to simulate the 
aerodynamics, engineers can now evaluate pre- 
liminary designs of wings, engines, and other 
airplane parts. (The Boeing 777 has been tout- 
ed as the first airplane designed entirely by 
computer.) Meteorologists, too, have improved 


their ability to solve the equations that describe next week’s weath- 


er, and environmental scientists understand better how pollution 


spreads. 
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Computational fluid dynamics (CFD to its 
friends) involves more than running the same 
old programs on newer and faster machines. 
Much of the progress has come from new math- 
ematical methods that enable existing comput- 
ers to do a better job. “One can very clearly 
trace the speed-up in calculations”’ over the last 
50 years, says Peter Lax, an applied mathemati- 
cian at the Courant Institute of Mathematical 
Sciences, who has observed and contributed to 
developments in the field. “About half the 
speed-up is due to the enormously faster speed 
of computers. The other half is due to improved 
methods of computing. Neither alone would 
have done it.” 

In theory, with unlimited computing power 
almost any differential equation can be solved 
numerically, to any desired degree of accuracy. 
The straightforward approach, called direct 
numerical simulation (DNS), is similar to esti- 
mating the area of a geometric shape by placing 
a grid over it and counting the number of 
Squares it occupies: the finer the grid, the bet- 
ter the estimate (see Figure 2). For a partial dif- 
ferential equation, the analogous grid must 
extend in as many directions as there are vari- 
ables, and a numerical value is assigned to each 
grid cell. The problem then boils down to solv- 
ing an equation that relates the value assigned 
to each grid cell to the values assigned to 
neighboring cells—that is, a “simple’’ matter of 
algebra. 

The trouble is, if the solution of the partial 
differential equation is at all complicated, the 
grid must be extremely fine, which leads quick- 
ly to an overwhelming amount of algebra. 
Thus, although in principle DNS can “solve” a 
fully turbulent flow, in practice the amount of 
computation soon skyrockets past the capabili- 
ties of any foreseeable supercomputer. 
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Figure 2. When approximating the area of a geometric figure 
by counting grid squares inside the figure, a fine grid (bot- 
tom) does better than a coarse grid (top). 


Fluid dynamicists use a quantity called the Reynolds number to 
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Figure 3. A multigrid. 
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describe the complexity of a flow. The Reynolds number is a 
dimensionless combination of easily measured quantities; for flow 
in pipes, it is the product of the diameter of the pipe and the aver- 
age velocity of the flow, divided by the viscosity of the fluid. 

As a rule of thumb, the amount of computation in a direct 
numerical simulation increases by an order of magnitude with each 
doubling of the Reynolds number. With existing computer tech- 
nology and algorithms, DNS 
does OK for flows with 
Reynolds number up to around 
10,000. That sounds impressive, 
until you realize that that’s 
around the Reynolds number you 
get in your plumbing when you 
turn on a bathroom faucet to 
wash your hands. (Water has a 
viscosity of approximately 
10~2cm?/sec; a faucet 1 centime- 
ter in diameter can fill a 1-liter 
measuring cup in about 10 sec- 
onds, which means the average 
velocity of the water as it leaves 
the tap is about 100 cm/sec). 
The flows engineers must con- 
tend with in designing airplanes 
or in cooling a nuclear power 
plant have Reynolds numbers in 
the tens of millions. 

Faced with such overwhelm- 
ing computational demands, flow researchers have looked to 
sophisticated mathematical theories for help. Some of the advances 
concern purely numerical techniques that apply to general classes 
of differential equations. Multigrid methods, for example, boost 
computational efficiency by using a hierarchy of grids. The basic 
idea is simple: In regions where things change rapidly from point 
to point, you need lots of grid points, but in regions where things 
change gradually, a coarser grid is enough. Roughly speaking, the 
multigrid approach starts each time step with a coarse approxima- 
tion, and then refines it where necessary using finer grids in regions 
it identifies as “interesting.” Sometimes the grid itself gives a good 
picture of what’s going on (see Figure 3). 


Other theoretical advances are specific to fluid dynamics and 
turbulence. Large eddy simulations and transport models have 
been especially helpful in boosting the accessible range of 
Reynolds numbers. Large eddy simulations are. based on hypothe- 
ses that relate properties of “observable”? eddies—swirls of flow 
that cover many cells of the computational grid—to the effects of 
eddies that are too small to be seen. Similarly, transport models 
are based on physical principles 
of energy transfer among phe- 
nomena at different scales. 
(Nonlinearity is again responsi- 
ble: If the problem were linear, 
each scale could be treated sepa- 
rately.) 

Researchers usually test new 
models by comparing | their 
results against DNS results at low 
Reynolds numbers. Steven 
Orszag and _ colleagues at 
Princeton University’s Fluid 
Dynamics Research Center have 
applied this benchmark to several 
new models for the pipe-flow 
problem. They have developed 
new DNS techniques that take 
advantage of the circular cross 
section of the pipe. They have 
also developed new analyses for 
how viscosity dissipates energy 
in turbulent flows. 

The Princeton researchers are also comparing their computers’ 
predictions directly to experimental results. Highly turbulent 
flows, while easy to create, are difficult to control in a reproducible 
manner. Nevertheless, Alexander Smits and Mark Zagarola in the 
Aerospace and Mechanical Engineering Department at Princeton 
are doing just that, with a design they call Superpipe (see Figure 4). 
Weighing in at 28 tons, with two-inch thick walls of carbon steel 
housing a five-inch diameter aluminum test pipe, Superpipe pumps 
highly compressed air around a 200-foot circuit, past test probes 
that can measure details of the flow down to the level of microns. 
By working with air that’s been compressed to as much as 240 


Figure 4. Researchers at Princeton 
University’s Fluid Dynamics Research 
Center study high-Reynolds number 
flows with Superpipe. (Photo courtesy 
of Steven Orszag and Mark Zagarola, 
Princeton University.) 


WHAT?’S HAPPENING IN THE 63 
MATHEMATICAL SCIENCES 


0.02 


Friction factor 


0.008 
0.007 


0.006 


Figure 5. Comparison of new friction factor data with the a relation proposed by Prandtl. 
Note the departure of the experimental data from Prandtl’s relation at a Reynold’s num- 
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atmospheres, the Princeton researchers can obtain Reynolds num- 
bers in the tens of millions. 

“The goal is to tie experiment together with theory and compu- 
tation,’ Orszag explains. So far they’ve found good agreement 
between experimental results and direct numerical simulations for 
Reynolds numbers up to around 10,000 and with their transport 
models up to Reynolds numbers of 50 million. Significantly, exper- 
iments have indicated a change in the nature of turbulence at 
Reynolds numbers above a million. Superpipe data suggest that the 
coefficient of friction, an important parameter for engineering 
applications, obeys a new scaling law for Reynolds numbers in the 
1 to 50 million range (see Figure 5). Orszag and colleagues believe 
the explanation will be found in a closer analysis of spatial fluctu- 
ations in the way viscosity removes energy from the turbulent flow. 

“Any time you find a scaling regime, it says there’s fundamental 
physics to understand,’ notes Orszag. In this case, he adds, the 
scaling is especially interesting because it connects an observable 
engineering parameter with some of the more esoteric structure of 
turbulence at very small scales. 

Dimotakis also emphasizes the importance of experiments in 

supplying new grist for 
the computational mill. 
He and colleagues at 
Caltech have designed 
equipment that will 
record digital images of 
fluid flow experiments, 
at a rate of a thousand 
frames per second. 
With each frame con- 


er aa pip sisting of 1000 x 1000 


pixels, “it’s going to 

— require the largest com- 
putational resources 

just to look at the data,”’ 


6 7 8 
10 10 10 says Dimotakis. The 


Reynolds number Caltech group did a test 


run in 1994 with a less 
capable prototype, 
imaging surfaces of 


ber of approximately 3 x 10°. (Figure courtesy of S. Orszag, A. Smits, and M. Zagarola.) constant concentration 
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in the turbulent mixing of a jet of one fluid into another. Such 
“isoscalar” surfaces are important in understanding processes such 
as combustion, which occur at places where variables like temper- 
ature, pressure, or concentration attain certain values. Researchers 
wonder, for example, whether these highly wrinkled and convolut- 
ed surfaces have a fractal geometry. But the analysis is about as 
complicated as the surfaces themselves. “The experiment lasted a 
little over ten seconds, and we have spent a year trying to under- 
stand what the three-dimensional surfaces look like,’’ says 
Dimotakis. 

Andrew Majda, an applied mathematician at the Courant 
Institute of Mathematical Sciences, is studying turbulent diffusion 
from a theoretical standpoint. He and colleagues have developed 
powerful new analytic and numerical techniques for handling the 
mathematical equations that describe such complex phenomena as 
the mixing of chemicals or the spread of pollutants released from a 
smokestack. 

Ordinary molecular diffusion occurs when one substance (some- 
times heat) spreads out through another by a kind of “drunkard’s 
walk,” similar to what might happen if 10,000 inebriated football 
fans got in their cars and drove off after the big game, making ran- 
dom turns at every corner. At first, the cars would be concentrated 
near the stadium, but eventually they would spread more or less 
uniformly throughout the city. Turbulence, which can be thought 
of as random variations in congestion that speed up or delay traffic 
at varying times and places, tends to accelerate diffusion, some- 
times by many orders of magnitude. Cigarette smoke, for example, 
might not annoy non-smokers if it weren’t for turbulence, which is 
present even in “still” air. Without turbulence, the unpleasant 
fumes would take hours, rather than seconds, to permeate a crowd- 
ed room. 

Many features of turbulent diffusion obey scaling laws. One 
such law was hypothesized in the 1920s by the English meteorolo- 
gist L.F. Richardson. Richardson’s “t-cubed law”’ says that the 
average distance between nearby particles—dust motes, for 
instance—grows with the three-halves power of time. (More pre- 
cisely, the mean square distance at time ¢ is proportional to t?.) In 
the early 1940s, the Russian mathematician Andrei Kolmogorov 
identified another scaling law, predicting that the average differ- 
ence in velocity between two points of a turbulent medium is pro- 
portional to the cube root of the distance between the points. 


Andrew Majda. (Photo courtesy of 
Robert P. Matthews and Princeton 
University. ) 
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Cigarette smoke might 
not annoy non-smokers 
if it weren’t for turbu- 
lence, which is present 
even in “still”’ air. 
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Richardson’s and Kolmogorov’s laws have been well borne out 
by experiments over the years, although researchers have never 
been able to observe the scaling behaviors over more than a few 
orders of magnitude at a time. Computational studies have also 
supported the two laws, but limits on computing power have 
restricted the range of scales that researchers can look at in a given 
computation. Richardson’s law has been particularly troublesome: 
Although researchers are confident that three-halves is the correct 
power for the scaling law, estimates for the constant of proportion- 
ality have varied from as little as 0.1 in some studies, to as much as 
3.5 in others. 

Majda’s group may have settled the issue at last. Borrowing 
ideas from quantum field theory and statistical physics, Majda and 
Marco Avellaneda at the Courant Institute have developed theoret- 
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Figure 6. Monte Carlo simulations with 100 (blue curve) and 1000 (black 
curve) realizations of the velocity field give an estimate of about 1.45 for 
the Richardson constant. (Figure courtesy of Andrew Majda and Frank 
Elliott. ) 


ical tools for analyzing models of the random velocity fields that 
characterize turbulence. Their approach is based on a hierarchy of 
models, including one that is simple enough to be solved exactly. 
While not directly useful as a model for any physical process, the 
exactly solved example provides an “unambiguous numerical test 
problem,’’ Majda explains. More recently, Majda and Frank Elliott 
at Princeton University (now also at Courant) have developed com- 
putational tools, including a new method of generating “realiza- 
tions’’ of the random velocity fields, that allow them to study a 
wide range of scales all at once. 

The results are encouraging. For their most general model, 
Majda and Elliott have confirmed Richardson’s and Kolmogorov’s 
laws for 11 and 12 orders of magnitude, respectively. That’s like 
saying the laws hold good on scales ranging from millimeters to 
millions of kilometers, or seconds to centuries. Their computations 
also indicate a “benchmark” value of about 1.45 for the propor- 
tionality constant in Richardson’s law (see Figure 6). 

The two scaling laws, though, are “just a warm-up” for future 
applications of the new methods, Majda says. He has an eye on 
other open problems that are complicated by a multitude of scales. 
Cloud physics, he notes, is one especially interesting and important 
area where his group’s “numerical laboratory” can test the predic- 
tions of competing models. | 

A complete understanding of the Navier-Stokes equations and 
related equations for fluid flow is unlikely to emerge any time soon. 
Still, researchers’ growing knowledge of fluid dynamics and how to 
compute it has given them greater respect for what theorists accom- 
plished before the age of high-speed computers. Indeed, says Lax, 
“the amazing outcome of the last 40 or 50 years of development of 
computational fluid dynamics is how well those equations describe 
fluid flow.” 


Limits on computing 
power have restricted 
the range of scales that 
researchers can look at 
in a given computation. 
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The mathematical 
descriptions of thin 
films make the 
Navier-Stokes 
equations look like 
child’s play. 
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An equation may or 
may not pay attention 
to what’s physically 
possible. 
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Figure 7. The smoothing out of a solution to the “lubrication equation.” 
(Figure courtesy of Andrea Bertozzi.) | 
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Figure 1. Domain walls and criss-crossing “particles” on a richly textured background are fea- 
tures of one-dimensional cellular automata based on Rule 54 (see page 78). (Figure courtesy of 
Erica Jen, created using SigniScope™ software package, courtesy of Signition, Inc.) 
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Cellular Automata Offer New 
Outlook on Life, the Universe, 


and Everything 


hat kind of world do we live in? The question 

has been bandied about for thousands of years 

by philosophers, theologians, and politicians. More 

recently, a spectrum of talk show hosts have weighed 

in on the subject. So far, no one’s come up with 

an answer that everyone can agree on. 

Mathematicians have considered the same question. 

But where others worry over the blurred boundaries 

of Good and Evil, mathematicians ponder a sharper 
dichotomy: the Continuous versus the Discrete. 

Continuous mathematics, exemplified by calculus 
and differential equations, has long dominated math- 
ematical descriptions of the world. But discrete 
mathematics is making a bid for primacy. With mod- 
ern computers, researchers have discovered astonish- 
ingly complex behavior in seemingly simple, finite 
systems. The results have led some theorists to spec- 
ulate that discrete models, which lend themselves to 
digital computation, are the “right” way to study 
nature. 

Erica Jen, a mathematician at Los Alamos 
National Laboratory in Los Alamos, New Mexico, is 
one of a growing number of researchers who believe that discrete 
mathematics can mirror many aspects of physical reality fully as 
well as the more customary continuous theories. Jen has been 
studying mathematical properties of discrete systems known as cel- 
lular automata. These systems, she says, are useful models for 
many types of complex physical, chemical, or biological systems. 
They also have an amazing life of their own. 

Cellular automata “exhibit an extremely rich and diverse range 
of pattern formation,” Jen says. Among the most interesting are 
“self-organizing’’ patterns: highly structured features that seem to 
emerge spontaneously from a “primordial soup” of random binary 
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Erica Jen. (Photo courtesy of Erica 
Jen.) 
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Figure 2. A 100 x 100 “majority vote” cellular automaton proceeds from a random initial state (top) to a final 
state (bottom right, facing page). On each ballot, every cell looks at the cells around it, and changes color value 
if its current value is in the minority. Most cells have 8 neighbors, but cells on the edges have 5 neighbors, and 
corner cells only 3. Some features of the final state take shape with the first round of voting (middle). 
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digits. Jen and her colleagues hope to understand exactly how 
these patterns arise and precisely what properties they possess. By 
studying cellular automata with mathematical tools from areas such 
as abstract algebra and number theory, Jen hopes to bring theoreti- 
cal rigor to a subject that is often as much art as science. 

Loosely speaking, a cellular automaton is a “pixelization”’ of 
space and time: Instead of varying continuously from point to 
point and moment to moment, cellular automata consist of discrete 
‘cells’? with discrete values that change instantaneously at discrete 
intervals, much like frames in a movie. The crucial feature, more- 
over, is a rule that prescribes exactly how each cell’s value changes 
depending on the values of nearby cells. 

One possible rule, for example, is a “majority vote”: Each cell 
in a system of black and white squares could be programmed to 
switch color if the majority of its immediate neighbors are of the 
opposite color (see Figure 2). Another rule might specify that the 
value of each cell change to the sum of the values of the cells sur- 
rounding it—or, reducing things to black and white again, to the 
parity of the sum (black could be odd and white even). 

“The essential features of cellular automata are that they are 
deterministic and discrete in space, time, and state values; they 
evolve acccording to local interaction rules; and these rules apply 


Jen hopes to bring 
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Computer technology 
was not really up to the 
job of exploring cellu- 
lar automata until the 
1980s. 
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synchronously and homogeneously across the system,’ Jen 
explains. These features accord well with standard physical 
assumptions about the uniformity of space and time (the laws of 
physics are the same everywhere) and the impossibility of instanta- 
neous action at a distance (nothing travels faster than the speed of 
light). They also lend themselves to modeling complex systems 
consisting of a large number of simple components that are locally 
connected. Perhaps most important, these features are tailor-made 
for digital computation. 

Cellular automata were first dreamt of in the early 1950s by 
John von Neumann and Stanislaw Ulam, as tools for studying bio- 
logical systems. In the late 1960s, John Conway, then at 
Cambridge University (now at Princeton), invented rules for a cel- 
lular automaton he called the Game of Life, which Martin Gardner 
popularized in his column for Scientific American. But computer 
technology was not really up to the job of exploring cellular 
automata until the 1980s, when color graphics workstations 
replaced the clattering teletype machines that traded alphanumeric 
symbols with a room-sized mainframe in another building. 

With today’s high-speed machines (fated, no doubt, to seem 
painfully slow in another few years), researchers can glimpse the 
complex patterns that often arise from the repeated application of 
the simple rules that define cellular automata. Fast computers 
allow experiments with relatively large systems: Automata with 
thousands of cells can be followed for hundreds of time steps on a 
personal computer; workstations and supercomputers can track sys- 
tems with millions of cells for thousands of time steps. 

Jen’s research focuses on a class of one-dimensional systems 
called “elementary” cellular automata. Each state of such a sys- 
tem is represented by a row of black and white pixels, correspond- 
ing to a string of 1’s and 0’s, and the update rule uses only the value 
of a given cell and the values of its two adjoining cells. (To sim- 
plify the description, researchers often work with a “wrap-around”’ 
model, in which the two ends are joined, so that all cells are treat- 
ed alike.) The evolution of a one-dimensional automaton is conve- 
niently displayed in a two-dimensional format, each new row below 
its predecessor. (Researchers also often “colorize” their elemen- 
tary systems to highlight key features.) The result can be as richly 
textured as a Navajo weaving. 

In the early 1980s, Stephen Wolfram, then at the Institute for 
Advanced Study in Princeton, roughed out a classification scheme 


for elementary automata based on the two-dimensional patterns 
that emerge from “random” initial states. Some rules, he noted, 
lead quickly to uninteresting, static behavior. The majority vote 
rule, for example, simply chips away at any segments of alternating 
values, and stops changing once it removes them. But other rules, 
Wolfram found, produce patterns that seem to reflect elements of 
both order and chaos. 

Wolfram also introduced a convenient notation for the rules of 
one-dimensional elementary cellular automata. Because every rule 
amounts to assigning a 1 or a0 to each of the eight strings 111, 110, 
101, 100, 011, 010, 001, and 000, each rule can be identified with 
an 8-bit binary number (see Figure 3). For example, the majority 
vote rule corresponds to the binary number 11101000, or 232 (= 
128+64+32+8). In particular, there are only 256 possible rules for 
one-dimensional elementary automata. (Wolfram cut the number 
to 32 by restricting attention to those rules that assign 0 to 000, thus 
leaving “empty space” empty, and respect left-right symmetry by 
assigning the same values to 100 and 001 and to 110 and 011.) The 
open-endedness of the subject comes from the fact that there can be 
arbitrarily many (or even infinitely many) cells. 

Computer experiments are commonly used to investigate the 
kinds of patterns that are associated with the 256 rules. 
Surprisingly, researchers have found that rules generate certain 
“generic’’ types of behavior: Except for specially engineered ini- 
tial conditions (for example, an initial state consisting of strictly 
alternating 1’s and 0’s, or, worse, all 0’s), each rule will typically 


111 110 101 100 O11 O10 O01 O00 


Figure 3. Each rule for an elementary one-dimensional cellular automa- 
ton can be read as a base-2 expression for a number between 0 and 255. 
The leftmost column gives the common, base-10 name for five especially 
interesting rules. 


For many rules, a 
single 1 in a sea of 0’s 
is enough to spawn a 
complex pattern. 
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Figure 4. A Rule 18 “particle” drifts left and right. 
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generate patterns that are qualitatively sim- 
ilar regardless of the initial condition cho- 
sen. For many rules, a single 1 in a sea of 
0’s is enough to spawn a complex pattern. 
But computer experiments alone don’t 
prove anything. For one thing, computa- 
tions are always done on systems of limited 
size, but the conjectures assert that the 
observed behavior holds for all systems, no 
matter how large. In particular, claims of 
chaos based on computer simulations, while 
often persuasive, fall short of mathematical 
rigor, since the behavior of any cellular 
automaton is ultimately periodic: Because 
it’s finite, an automaton must eventually 
repeat one of its states. As soon as that hap- 
pens, the deterministic rule demands that 
the system simply cycle endlessly through 
an unvarying sequence of states—behavior 
that is the very antithesis of chaos. 
Computer experiments are also not particu- 
larly helpful in developing a theory that 
explains why rules generate the patterns 
they do, or why two rules are similar or dif- 
ferent in the patterns they generate. 
“Trivial’’ systems, of course, don’t 
require much proof. For example, rule 240 
(= 11110000), which tells each cell to adopt 
the value of the neighbor on its left, obvi- 
ously produces diagonal stripes of widths 
that depend only on the initial pattern of 1’s 
and 0’s. More interesting rules, however, 
pose real challenges to mathematicians. 
Through detailed analysis, Jen has pro- 
vided mathematically rigorous proofs for 
some of the observations about elementary 
automata. She and colleagues, including 
Peter Grassberger at the University of 
Wuppertal in Germany and James 
Crutchfield at the University of California 
at Berkeley, have found subtle relationships 


between the behavior of certain “simple’’ systems and patterns that 
arise in more complicated automata. 

One such relationship connects the patterns produced by Rule 
90 with those of Rule 18 and several other rules. While hardly triv- 
ial, Rule 90 is relatively simple in that it is “linear”: Each cell’s 
value changes at each step to the sum (mod 2) of the values on 
either side of it. Because of linearity, the behavior of Rule 90 can 
be studied using tools from linear algebra. Jen and others have 
studied linear automata extensively. 

Although Rule 18 is nonlinear, Jen has shown that it is equiva- 
lent to Rule 90 in a very precise sense. In particular, given any ini- 
tial pattern consisting of isolated 1’s separated by odd numbers of 
0’s, such as 10100010100000, both rules produce exactly the same 
subsequent behavior. Jen found a way to track the effect of insert- 
ing additional 0’s into a Rule-18 pattern. This makes it possible to 
transform any initial pattern into one with isolated 1’s and odd- 
length strings of 0’s, use Rule 90 to “evolve” the modified pattern, 
and still recover the correct final pattern for Rule 18. The proce- 
dure, Jen notes, is reminiscent of a technique known as the inverse 
scattering method, which is used for solving the partial differential 
differential equations that give rise to solitons (see “New Wave 
Mathematics” in What’s Happening in the Mathematical Sciences, 
Volume 2). 

One way to display the equivalence of Rules 18 and 90 without 
actually inserting the extra 0’s is to color-code the stretches where 
the 0’s would go. The result can be interpreted as “particles’”’ drift- 
ing lazily left and right as they move down the page (see Figure 4). 
Occasionally two particles collide and annihilate each other; on 
other occasions three particles coalesce into one. What’s crucial is 
that the number of particles never increases. In a finite setting, the 
number of particles obviously stabilizes as soon as the automaton 
settles into a cycle. In an infinite setting, with a finite stretch of 1’s 
and 0’s embedded in a sea of 0’s, Jen has proved that the number of 
particles dwindles down to either one or zero, depending on 
whether the original pattern had an odd or even number of particles. 
In effect, the infinite stretches of 0’s act like force fields, nudging 
the particles toward the center of the expanding pattern (see Figure 
5, next page). | 

A number of rules share Rule 18’s mechanism of generating dif- 
fusively annihilating particles, and can also be mapped onto Rule 
90. Other rules, however, exhibit quite different particle-generating 
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mechanisms. Rule 54 (= 00110110), for example, operates on ran- 
dom initial patterns to rapidly produce a “background”’ pattern con- 
sisting of the 4-cell “bricks” 1110 and 0010. Any pattern built 
exclusively with these two bricks will persist, drifting to the right 
and toggling the two bricks (see Figure 6). Breaks in the pattern, 
however, such as an extra 1 or 0 stuck between two bricks, tend to 
form “domain walls”’ that persist indefinitely. The domains “com- 
municate” by sending particles back and forth. (See Figure 1, 
page 70.) 

Most researchers concentrate on the behavior of the walls and 
particles, treating the background as immaterial. By focusing on 
details of the background, however, Jen has been able to give rig- 
orous proofs for some of the observed behavior. For example, 
looking again at finite patterns embedded in a sea of 0’s, she has 
shown that for almost any background with two dislocations, the 
dislocations will drift infinitely far apart, as if there’s some repul- 
sive force between them. Jen conjectures that this is true in gener- 
al, but that—like much else about the world, whether continuous or 
discrete—remains to be proved. 
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Figure 5. Two Rule-18 particles coalesce and annihilate each other. 
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Figure 6. Rule 54 exhibits simple behavior on any pattern consisting of the two “bricks” 1110 and 0010. 
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A discrete, probabilis- 
tic approach seems 
appropriate for study- 
ing biological systems, 
since life thrives on 
variability and random 
selection, and popula- 
tions consist of individ- 
uals. 


80 WHAT?’S HAPPENING IN THE 
MATHEMATICAL SCIENCES 


Figure 7. A computational forest fire. (Figure courtesy of Richard Durrett, 
Cornell University.) 


Figure 8a. Pascal’s triangle mod 2. The familiar fractal-like figure can be interpreted as a one- 
dimensional, “Rule 60” cellular automaton, with each row shifted half a square to the left. 


Figure 8b. Pascal’s triangle mod 2 with random mistakes. Strange things happen if dark squares in 
each new row are randomly erased (in this case, with probability .02). Oddly enough, random era- 
sures lead to a proliferation of dark squares. (Figures 8a and 8b courtesy of Richard Durrett, 


Cornell University.) 
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Are Group Theorists 
Simpleminded? 


athematicians love to make things simple. They’ll work hard 
Mi: simplify a subject—even to the point of pooling efforts to 
produce a 15,000-page proof that no one regards as anywhere close 
to simple. 

That’s been the state of affairs since 1980 for the Classification 
Theorem for Finite Simple Groups. The theorem is a relatively 
simple statement about quintessentially simple algebraic objects, 
but proving it took the combined efforts of hundreds of mathemati- 
cians, working for almost 30 years. The proof sprawls over hun- 
dreds of papers, including some that have never been published. 
The idiosyncrasies in style, notation, and terminology of its diverse 
contributors make the proof so hard to follow that no one has ever 
tried reading every single page of it. In group theory circles, the 
result has come to be known as the Enormous Theorem, even 
though it’s the proof, not the theorem, that’s enormous. 

Mathematicians like simple definitions and theorems, but they 
want simple proofs, 
too. The Enormous 
Theorem has been a 
thorn in the side of 
group theory for the 
last 15 years. That 
looks likely to change, 
however. A _ small 
team of mathemati- 
cians, led by Richard 
Lyons at Rutgers 
University and Ronald 
Solomon at Ohio State 
University, is working 
on a “second genera- 
tion’ proof of the 


classification theorem. 
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“The plan is to give Group theorists. From left to right: Richard Lyons, Walter Feit, Ronald Solomon, Curtis 
a whole new proof, Bennett, Jon Carlson, and Gary Seitz. (Photo courtesy of Alexandra Feit.) 
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‘*The plan is to give a 
whole new proof. ” 


—Jonathan Alperin 
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mostly using the techniques that were developed for the first proof, 
but with some new ones too, and to have it all be consistent,” 
explains Jonathan Alperin of the University of Chicago. Some of 
the second-generation proof has been completed; more is under 
way. Lyons and Solomon say the final product will still be pretty 
complicated, but they expect it to be much shorter than the original 
proof. At the very least, it will be possible to collect the entire 
proof on one long bookshelf, without what Lyons refers to as an 
“infinite regress” of references to other papers. 

But what are simple groups, and why should anyone want to 
classify them? For that matter, what are groups, and what does it 


mean to classify them? 
Groups are fundamental algebraic structures that arise in the 


Figure 1. Translational symmetry (top) goes hand in hand with the group 
of integers; rational symmetry (bottom) corresponds to “clock” arithmetic. 


study of symmetry. The set of integers, for example, forms a group 
corresponding to translational symmetry (see Figure 1). Likewise, 
the hours on a clock form a group related to rotational symmetry. 
The former example is an infinite group, while the latter is finite. 
Similarly, the rotational symmetries of a sphere comprise an infi- 
nite group, while those of a cube comprise a finite group. 

In essence, any mathematical operation that can be undone (as 
subtraction “undoes” addition, or rotation clockwise “undoes” 
rotation counterclockwise) leads to a group. Many operations that 
at first glance look quite different, turn out, abstractly, to define the 
same group. To keep things straight, mathematicians like to classi- 
fy groups according to various properties, much as taxonomists sort 
critters into species. A first observation is that many groups are 
built from smaller components, which themselves are groups. In 
the finite case, the process of reduction to smaller groups must ter- 
minate; the irreducible components are called simple groups. 

Simple groups are analogous to atoms, from which molecules 
are built. Just as chemists arrange the elements in rows and 
columns of the “periodic chart,” group theorists have for decades 
sought to identify all the simple groups and classify them accord- 
ing to their fundamental properties. That’s what the Enormous 
Theorem does. 

The Enormous Theorem says that simple groups come in four 
flavors: cyclic, alternating, Lie-type, and sporadic. The first three 
families all have infinitely many members. The class of simple 
cyclic groups, for example, consists of all groups with a prime 
number of elements—one for each prime number. (A cyclic group 
can be interpreted as the set of rotational symmetries of a regular 
polygon, which “cycle” through a set of positions, hence the name.) 
Though infinite in extent, these three classes arise for reasons that 
are fairly simple, with well-understood connections among the 
groups in each class. The sporadic groups, on the other hand, seem 
to constitute little more than a grab-bag of leftovers. They are basi- 
cally defined by what they’re not: simple groups that don’t fit into 
the other three categories. They tend to be called by the name of 
the person who discovered them. One, however, is affectionately 
known as the Monster. It has more than 10°° elements, and can be 
interpreted as a group of rotations in 196,883-dimensional space. 
Its astonishing connections with other areas of mathematics, 
including the theory of modular forms (see “Fermat’s Theorem—at 
Last!,” pages 2-13), led John Conway and Simon Norton, then at 
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don’t by themselves account 


Theorem began in earnest in 
, the 1950s. A major break- 
through came in 1963, when Walter Feit at Yale University and 


John Thompson at the University of Chicago proved that every sim- 
ple group, other than cyclic ones, has an even number of elements. 
At 255 pages of dense mathematics, the Feit-Thompson paper hint- 
ed at how arduous the task of classifying the simple groups would 
be. 

In 1972, Daniel Gorenstein of Rutgers University outlined a 
“16-step plan” for completing the proof. Still, few group theorists 
expected the effort to succeed, at least not in this century. But a 
series of breakthroughs, particularly by Michael Aschbacher at 
Caltech, changed people’s outlook. Aschbacher introduced a suite 
of new techniques and used them to solve a slew of problems that 
had been identified as crucial to proving the classification theorem. 
At the same time, a result of Franz Timmesfeld at the University of 
Bielefeld suggested to researchers that they had seen the last of the 
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sporadic groups. Timmesfeld showed, in essence, that the primary 
“cause” of sporadic groups had been exhausted. There was still a 
lot of work to do, but “by 1976, almost everyone believed that the 
classification problem was ‘busted’,”’ Solomon writes in a survey of 
the subject for the Notices of the American Mathematical Society 
(February, 1995). 

They were right. With Gorenstein coordinating the diverse 
efforts of scores of researchers, the last few pieces finally fell into 
place. In 1980, Gorenstein declared victory in what he dubbed “the 
Thirty Years War.” 

The proof, however, was unlike anything ever seen before in 
mathematics. Most proofs can be read and verified by anyone 
knowledgeable in the field. But most proofs are at most a few 
pages long, and those that aren’t self-contained rest either on a 
well-developed theory or on a handful of references. None of that 
held for the Enormous Theorem, and no one had the time, the 
patience, and the expertise to double-check every nook and cranny 
of the myriad papers that comprised its proof. Furthermore, 
researchers acknowledged, many of the individual papers almost 
certainly contained errors—usually bad news for a mathematical 
proof. 

“Viewed retrospectively and soberly, it was perhaps a bit hasty 
to claim that everything was finished before the manuscripts had 
been checked carefully, but it was quite understandable,” Solomon 
notes. “Mathematics is done by human beings, who have an emo- 
tional aspect to their personalities in addition to a rational one.” 

Nevertheless, group theorists are confident that whatever gaps or 
errors their unusual proof may contain, none is likely to be difficult 
to correct. (Andrew Wiles’s first stab at Fermat’s Last Theorem 
contained several minor mistakes, all of which he fixed easily, in 
addition to the more serious gap that took a year to correct.) “The 
basis for the reliability of [the proof] is that very many parts of it 
are extremely parallel,” explains Stephen Smith of the University 
of Illinois at Chicago. If a technique were misapplied in one paper, 
its proper use could almost certainly be found elsewhere. 
Mathematical proofs are often described as “chains”’ of logical rea- 
soning—breaking any link is fatal. The proof of the Enormous 
Theorem is more like cloth (or perhaps chain mail), which can sus- 
tain small tears without falling apart. As Smith puts it, “It’s very 
unlikely there’s any hole you could drive a truck through.” 

Gargantuan proofs that require many mathematicians’ efforts are 


Daniel Gorenstein. 
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the complete series will 
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becoming more common, although the Enormous Theorem still 
takes the cake (and the candles) for sheer size. “It’s a phenomenon 
of our time, to have mathematics that has theorems in it where the 
proofs take this long,” says Lyons. Jonathan Alperin agrees: 
“Group theory is the first field where there were single results 
proved by hundreds of pages of mathematics, but now it’s all over 
the place,” he says. “It’s a new kind of mathematics in a way. The 
size makes a real difference; it changes how people look at things 
and the amount of effort they’re willing to put in.” 

Still, it would be nice to have a simpler, shorter proof—some- 
thing, say, of Pantagruelian rather than Gargantuan size—especial- 
ly for a result as fundamental as the classification theorem. That’s 
what Lyons and Solomon are aiming for, continuing work of 
Gorenstein, who died in 1992. 

“Gorenstein and the two of us didn’t invent this idea of writing 
a second-generation proof,” Lyons notes. Indeed, bits of the proof 
were being rewritten as early as 1970. Lyons 
credits Helmut Bender at the University of 
Kiel in Germany as “the first person to really 
make substantial progress” on simplifying 
the Enormous Theorem. In 1994, Bender and 
George Glauberman at the University of 
Chicago published a second-generation 
proof of the Feit-Thompson result. 
Aschbacher, too, recently published a mono- 
graph on sporadic groups that collects and clarifies much of the ear- 
lier work on the subject. “What we’re doing has the same philoso- 
phy, but applied to the whole rest of the proof,” Lyons says. 

Lyons and Solomon foresee a dozen or more volumes contain- 
ing an essentially complete proof of the classification theorem, 
skipping only those parts that have already received thorough sec- 
ond-generation treatment. The first two volumes have already been 
published (by the AMS). Volume 1—a relatively slim 165 pages— 
is mainly an outline of what’s to follow. Volume 2 starts the proof 
in earnest. The authors estimate the complete series will weigh in 
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at around 5000 pages—hardly short, but a lot more readable than 
the original. In any case, the project will keep them off the streets 
well into the next century. “We’re just trying to keep a steady 
pace,” notes Lyons. 

Satisfying mathematicians’ craving for simplicity is not the only 
goal. “There were many reasons to go back and start a project like 
this,” says Lyons. One is to clarify the global logic of the proof— 
with the original so diffuse, it’s hard to see how all the pieces fit 
together. Indeed, says Solomon, “the state of the original proof is 
such that if everyone who worked on it should vanish, it would be 
very hard for future generations of mathematicians to reconstruct 
the proof out of the literature.” 

That’s understandable, notes Alperin. “When it was being done, 
the experts were writing for each other. They 
weren’t worrying too much about somebody 
reading it 20 years later. It always happens 
that way insmath. The first climber up the 
mountain doesn’t find the easiest route.” 

Another reason is to weed out any errors 
that may still mar the proof. Finally, notes 
Alperin, “there are things to be learned by 
redoing it.” By straightening out the strands 
of the original proof, Lyons and Solomon 
have already been able to stretch them fur- 
ther, proving some of the component theo- : 
rems in considerably greater generality. generality. 
They and others working on the second-gen- 
eration proof have also found new applica- 
tions of the original proof’s techniques. 

That’s not unusual in mathematics. The Enormous Theorem 
may be unique in its length, but mathematicians often find new 
mathematics in old proofs. Who knows? Some member of 
Generation X, Y, or Z may one day find a proof of the Enormous 
Theorem that only takes a thousand pages—or, perhaps, fits in the 
margin of a computer disk. 
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The Secret Life of Large 


Numbers 


he ossifrage, also known as the lammergeier, is a rare, preda- 

tory vulture of the Old World mountains, with a wing span up 
to ten feet. The bird’s name means “bone breaker.” If that’s any 
indication, its habits are anything but squeamish. 

That bit of ornithological trivia has little relevance to mathemat- 
ics. But a squeamish ossifrage was recently sighted in the field of 
mathematical cryptography. In April, 1994, a loose-knit interna- 
tional team of code-breaking volunteers finally cracked a 17-year- 
old challenge problem. They decoded a 128-digit ciphertext (see 
Figure 1), obtaining the enigmatic message, “The magic words are 
squeamish ossifrage.”’ 

The decoding of the ossifrage message highlights 20 years of 
progress in two seemingly disparate disciplines: the very practical 
concerns of secure communication and the highly theoretical world 
of number theory. The squeamish ossifrage had been locked away 
by a numerical code whose key was contained in the prime factors 
of a 129-digit number. 

The result is of significance to more than spies and number- 
crunchers. As computers do more and more in our daily lives, the 
need for secure communication increases. Anyone who punches a 
4-digit number into a cash machine or enters a password to login to 
a computer account already engages in secret communication. On- 
line banking and other new forms of computerized commerce will 
demand that machines protect the data they have access to. Data 
encryption, it’s generally agreed, is the answer. 

But where does number theory fit in? 

The story starts in 1976, when Whitfield Diffie and Martin 
Hellman at Stanford University introduced a radically new idea into 
cryptography: the notion of public key codes. It had always been 
tacitly assumed that knowing how to code messages meant, auto- 
matically, knowing how to decode them. For example, if you know 
that a code is based on an alphabetic shift, replacing A with B, B 
with C, and so forth, then it’s easy to decode “Uif nbhjd xpset bsf 
trfbnjti pttjgsbhf’: just shift everything backwards. Diffie and 
Hellman challenged this “obvious”’ relationship of coding and 
decoding. They proposed the possibility of codes for which know- 
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ing the coding algorithm—that is, knowing how to turn “plaintext”’ 
into “ciphertext’”—is no help in the decoding process. 

The following year, Ronald Rivest, Adi Shamir, and Leonard 
Adleman at M.I.T. proposed a practical implementation of Diffie 
and Hellman’s idea. (For more on Adleman, see “Computer 
Science Discovers DNA,” pages 26-37) The M.LT. trio’s scheme, 
known as the RSA code, relies on a handful of theorems in ele- 
mentary number theory. It has been successful enough to spawn a 
company, RSA Data Security Inc., based in Redwood City, 
California. 

To illustrate the RSA code, suppose that Bob wants to send 
Alice the message “I love you.” He starts with a simple scheme 
(known to everyone) that translates letters into numbers, say A=01, 
B=02, etc. (with 00 for the space between words), so that the mes- 
sage becomes the number M = 9001214220500251421. Bob then 
looks up Alice’s public encryption algorithm, which consists of two 
numbers, NV and e (“‘e”’ stands for encryption). N is a large num- 
ber, say with 200 digits. It happens to be the product of two large 
primes, but that doesn’t matter to Bob. He just takes M, raises 
it to the e th power, divides by N, and records the remainder, C’. In 
mathematical symbols, this is denoted C= M° mod N. The 
number C' is the coded message. Bob sends it to Alice. 

Alice has her own secret decryption number, d. To decode 
Bob’s message, she uses C, d, and N to compute the number 
C4 mod N. The result, as if by magic, is M, which Alice need 
only translate back into letters. 

It’s not magic, of course. The result is built into number-theo- 
retic relations among e, d, and the prime factors of NV. If N = pq 
for primes p and q, then e and d can be chosen as any two numbers 
with the property that (p — 1)(q —1) divides ed — 1. In essence, 
for each e there is a unique d that works. (To be precise, e cannot 
have any factors in common with p— 1 or gq — 1. In particular, it 
can’t be even. But almost any large prime number will do.) 
Moreover, given e, it’s actually quite easy to find the appropriate 
d—provided you know p and q. 

In theory, then, there is no secret. Anyone who wants to break 
Alice’s code need only find the factors p and q of her number NV. 
In principle, that’s easy for a computer to do. What makes the RSA 
code “unbreakable”’ is that factoring a product of large prime 
numbers, although conceptually simple, is fantastically time-con- 
suming. For numbers hundreds of digits long, a zillion supercom- 


puters working together couldn’t do it in a zillion years. Factoring 
is hard work. 

Or is it? 

With straightforward trial and error, it certainly is. The maxi- 
mum number of trial divisions needed to factor a number JN is 
roughly proportional to the square root of NV. Unfortunately, more- 
over, this maximum is quite often attained. Thus, factoring a 200- 
digit product of two 100-digit primes this way can require on the 
order of 101°" trials—beyond what any conceivable computer could 
do in any conceivable amount of time. 

But trial-and-error is not the smartest approach to factoring large 
numbers. Mathematicians as far back as Fermat had come up with 
more efficient methods, based on number-theoretic principles. 
Fermat himself had a particularly interesting method: To factor N, 
first find numbers x and y such that x* — y? = N. Then the left- 
hand side factors easily, into (e+ y)(a—y). For example, 
21=25-4=57 —- 27 = (54 2)(5-2) =7x3. 

A more flexible variant of Fermat’s method is to find numbers x 
and y such that N divides x* — y*. Then there’s about a 50:50 
chance that x + y contains one factor of N and x — y contains the 
other. (If not—if, say, NV simply divides x + y—then you keep 
looking.) For example, 21 divides 107 — 4’, and the factors, 
10+4=14=>2x/7 and 10—4=6=2 x 3, again reveal the 
factorization 21 = 7 x 3. 

In the 1920s, Maurice Kraitchik proposed patching together 
values for x and y from smaller numbers. With N = 111, for 
example, a short search reveals that 


11? mod 111 = 121 mod 111=10=2~x 5 
147 mod 111 = 196 mod 111 =85=5x 17 
167 mod 111 = 256 mod 111 = 34 =2x 17 


where the notation a mod N refers to the remainder one 
gets when a is divided by N. It now follows that 111 
divides (11 x 14 x 16)? — (2x 5x 17)?, which factors into 
2634 x 2294. The factors of 111 can now be sought by a method 
called the Euclidean algorithm, which has been known for more 
than two thousand years. The Euclidean algorithm produces the 
greatest common divisor of two numbers by computing a sequence 
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of remainders. For example, the greatest common divisor of 111 
and 2294 is found by computing 


2294 mod 111 = 74 
111 mod 74 = 37 
74 mod 37 = 0 


which shows that 37 divides both 111 and 2294. A similar calcu- 
lation reveals 3 as a common factor of 111 and 2634, which com- 
pletes the factorization 111 = 37 x 3. 

Michael Morrison at the University of California at Los Angeles 
and John Brillhart at the University of Arizona systematized the 
patchwork approach to factorization in the 1970s, using linear alge- 
bra to pick out which squares to combine. Their method amounts 
to, essentially, specifying a collection of small primes, called a fac- 
tor base, and then sifting through a large number of candidate val- 
ues xz, keeping only those for which the prime divisors of 
x mod N (the remainder of x” when divided by V) belong to the 
factor base. The surviving factorizations of z* mod N over the 
factor base are recorded in a matrix of 0’s and 1’s. Each row cor- 
responds to an z, and each column to a prime in the factor base. 
The entry in “row x, column p” specifies whether p divides 
x? mod N an even or an odd number of times (see Table 2, page 
99). Once a suitably large supply of x’s is in hand, linear algebra 
kicks in, to help find a combination all of whose exponents over the 
factor base are even. The theory of linear algebra guarantees that 
some combination exists whenever there are more rows than 
columns. Not every combination succeeds in factoring the number 
N, but with many possible combinations—that is, with many more 
rows than columns—the odds go way up that at least one will do 
the job. 

This was how things stood in 1977, when Rivest, Shamir, and 
Adleman proposed their public-key cryptography system. They 
reckoned that a computer doing a million operations per second 
could factor a 50-digit number in about 4 hours, but a 100-digit 
number would take the better part of a century, and a 200-digit 
number around 4 billion years. Even allowing for a million-fold 
speedup in computing (which is yet to happen), a code based on a 
200-digit number seems pretty secure. 

Martin Gardner described the RSA code in his August, 1977 
Scientific American column. To demonstrate the technique’s 
power, the M.I.T. researchers encoded the ossifrage message, using 


a 129-digit number for Nand a 4-digit number for e. Gardner pub- 
lished the resulting ciphertext, along with N and e—and the M.L.T. 
group’s offer of $100 to the first person to crack the code. 

That $100 looked safe, at least for the next 20,000 years or so: 
The M.LT. group estimated it would take around 23,000 years to 
factor a 129-digit number. (By then, at say 6% interest compound- 
ed annually, the $100 would be worth something in the 500-digit 
range.) Advances in computer speed might knock off an order of 
magnitude or so, but that still seemed to leave a wide margin of 
safety for the number known as RSA-129. 

Not so. The RSA challenge number yielded a mere 17 years 
later to a calculation that took less than a year from start to finish. 
A loosely organized group of factoring afficionados, numbering 
over 600 individuals in more than two dozen countries, nailed the 
64- and 65-digit prime factors of RSA-129 in an eight-month effort, 
ending in April, 1994. 

The project to factor RSA-129 relied on technological develop- 
ments that were just appearing on the horizon in 1977. One was the 
Internet. The heart of the effort, though, was an algorithm called 
the Quadratic Sieve. 

Introduced in 1981 by Carl Pomerance at the University of 
Georgia, the Quadratic Sieve in essence revs up Kraitchik’s algo- 
rithm and puts the Morrison—Brillhart machinery into high gear. 
The hang-up in the patchwork approach had been the difficulty of 
finding numbers x for which xz? mod N factors over the factor 
base. Most numbers don’t have this property, and that makes for 
lots of wasted, trial-and-error computation. The Quadratic Sieve 
streamlines this process and thus greatly accelerates the construc- 
tion of Morrison and Brillhart’s 0-1 matrix. 

The Quadratic Sieve works roughly as follows (see Box on 
pages 98-99). A range of values z slightly greater than the square 
root of NV is considered. For each prime p in the factor base, the 
first p values of x” — N are examined, to identify ones that are 
divisible by p. (The factor base must be tailored to the number JN, 
so that each prime in it divides at least one value of x? — N.) Once 
such a value is identified, every pth value thereafter is automatical- 
ly also divisible by p. This idea is the key to the algorithm; it elim- 
inates the need for expensive trial divisions, once the process has 
begun. The algorithm then “sieves” the numbers x, successively 
dividing each appropriate value of x” — N by the primes in the fac- 
tor base. Those values that end up fully factored, with all prime 
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divisors in the factor base, are recorded in the O—1 matrix. 

The Quadratic Sieve can be jazzed up in two ways significant for 
the RSA-129 project. First, it may pay to record numbers that fac- 
tor only partially over the factor base. A number x? mod N that 
leaves one or two large prime factors after division by the factor- 
base primes may still be useful if it can be combined with another 
such number. If N = 991241, for example, one finds that 
9972 mod N = 2! x 173 and 1079? mod N = 2? x 5° x 173, 
so that (997 x 1079)? = 2 x 5 x (2? x 5 x 173)’. A combination 
of partial factorizations that brings the “extraneous” primes to an 
even power is called a cycle. 

Second, the sieve can be run not only with the 
expression x” — NN, but also on more general qua- 
dratic polynomials of the form (az + b)? — N. 
This variant, called the Multiple Polynomial 
Quadratic Sieve, has the advantage that each 
polynomial sieve works independently. This 
allows the job to be distributed among many dif- 
ferent computers; each machine reports back to 
headquarters the full and partial factorizations it 
finds. Then a coordinating computer searches for 
cycles among the partial factorizations, and, final- 
ly, does the the 0-1 matrix computation. 

That’s how RSA-129 was factored. The pro- 
ject was launched by Paul Leyland at Oxford 
University, Michael Graff at Lowa State 
University, and Derek Atkins at M.LT., using pro- 
grams developed by Arjen Lenstra at Bell 


Research Center in Palo Alto, California. The group’s factor base 
included 524,339 primes. They arranged to distribute programs 
and databases via the Internet, and then solicited volunteers. “Your 
donations of idle cycles on your PC’s, workstations, supercomput- 
ers and fax machines may not be tax deductible, but they are truly 
a charitable donation,” Leyland told the troops in one e-mail com- 
muniqué. (He wasn’t kidding about the fax machines: Someone in 
the U.S. had actually figured out how to make a fax machine’s com- 
puter sieve numbers between phone calls.) 

Results accumulated over the next eight months, with around 30 
thousand full and partial factorizations coming in every day—more 


on weekends, when machines tend to have more free time. By 
April, 1994, they had logged 8,424,486 factorizations: 112,011 
“fulls,” 1,431,337 “single partials,’ and 6,881,138 “double par- 
tials.” Using the partials, they managed to construct 457,455 
cycles. That added up to a 0-1 matrix with 569,466 rows, more 
than enough to guarantee the existence of combinations with even 
exponents for all 524,339 primes in the factor base. 

The last difficult step was to find those combinations. After that, 
factoring RSA-129 would be a snap (unless, of course, none of the 
combinations worked, which was highly unlikely). Again, the task 
is conceptually simple, but the size of the matrix makes it compu- 
tationally difficult. In these days of gigabyte hard disks, a half mil- 
lion rows and columns may not sound too formidable, until you 
remember to multiply: The matrix for RSA-129 had almost 300 
billion entries. 

Lenstra did the 
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“structured Gauss.”’ This was possible because the 0-1 matrix in 
any quadratic sieve calculation contains mostly 0’s—just a few 
primes in the factor base appear in each factorization, and some of 
those will appear to an even power. The smaller matrix, in effect, 
eliminates the tedious stretches of 0’s. 

The second step was to find a suitable combination using the 
smaller matrix. This was still a job for a supercomputer, because 
even the smaller matrix was pretty hefty, weighing in at 188,346 
rows and 188,146 columns. That comes to more than 35 billion 0’s 
and 1’s, or more than 4 gigabytes of data. (Each 0 or 1 is a single 
data “bit”; a “byte”’ is 8 bits.) These days, 4 gigabytes is not con- 
sidered an outrageously large data file, but the computation Lenstra 
planned required that every single bit be correct—even one mistak- 
en entry in the matrix would ruin the result. “Hardly ever do you 
find a file where all 4 gigabytes are significant,’ Lenstra notes. 
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But the calculation worked. A suitable combination was found 
from the 0-1 matrix. With it came the factors of RSA-129, and 
then it was a simple matter to compute the squeamish ossifrage. 

Remarkable as it is, the factorization of RSA-129 does not com- 
promise the security of current RSA-based codes. The computer 
power it takes to factor numbers still skyrockets with the number of 
digits. The factoring folks figure they’ ll be able to attack 150-digit 
numbers in the near future—a new technique called the Number 
Field Sieve, which expands the Quadratic Sieve into the realm of 
algebraic numbers, has shown promise. But the cryptography 
crowd can easily stay ahead, just by basing codes on larger and 
larger numbers. “We’re recommending that people use 200- to 
300-digit numbers,”’ notes Rivest. Barring a dramatic breakthrough 
in computational number theory, factorization will remain a hard 
problem for a long time. But progress, like the integers themselves, 
is something number theorists know they can count on. 
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In Math We Trust 


ost people think of mathematics as an abstract, ethereal pur- 
M&:: far removed from such mundane matters as earning a 
living or making ends meet. 

But what concept is really more abstract than that of money’? 
Think about it: For a week’s sweat and toil, what do you get? A 
piece of paper with a number of it. Sometimes you pass the paper 
along in exchange for something you can actually eat or wear or 
otherwise enjoy. Other times you turn it over to someone who 
promises to give you a bigger number back someday. More and 
more often you just throw the piece of paper away because all it 
does is report that the organization you work for has sent the same 
number to some other organization whose raison d’etre is to record 
and report numbers. The work of a good many people, in fact, is to 
keep tabs on all the numbers that are flying around. 

And the numbers fly fast and furious these days. In olden days 
the numbers came in the form of little rocks or chunks of metal 
(some of the smaller numbers still do). When people learned to 
read and write, the numbers started appearing on paper. But when 
the telephone and other means of long-distance communications 
came along, rocks, metal, and paper could no longer move fast 
enough to keep up with the speed of transactions. Today most of 
the numbers that society depends on are reported and recorded and 
exchanged in truly ethereal forms. They’re carried by electrons or 
electromagnetic waves and stored invisibly as variations in a mag- 
netic field. 

Finance—the art of circulating money—is inherently mathemat- 
ical. Figuring out how to turn a profit from a financial transac- 
tion—or, just as important, guarding against financial ruin—always 
involves someone, somewhere, doing a calculation. Computers, of 
course, now handle much of the grunt work of keeping accounts, 
amortizing loans, and dunning us for late payments. But comput- 
ers by themselves can do only so much. Given the Byzantine inter- 
connectedness of the world economy, the vast sums at stake, and 
the pace of modern communications, computer models of financial 
markets must rely more and more on sophisticated mathematical 
methods—many of which embody abstract mathematics of the 
most ethereal kind. 

Just ask Joseph Traub and Henry Wozniakowsk1. 
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Joseph Traub. (Photo courtesy of Joe 
Pineiro, Columbia University.) 
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Figure 1. Jn calculus, the integral is 
defined as the area beneath the curve. 
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Five years ago, Traub, a professor of computer science at 
Columbia University, was lecturing on unsolved problems in theo- 
retical computer science. One problem concerned the “best” way 
to approximate the numerical value of a multivariate integral—an 
abstract-sounding problem, but one with important applications in 
areas such as mathematical physics. Somewhat jokingly, Traub 
offered a $64 “bounty”’ for a solution. 

Traub’s joint appeal to mathematical curiosity and financial 
interest worked. Within a week, WoZniakowski, who holds joint 
appointments at Columbia and at the University of Warsaw, in his 
native Poland, solved his colleague’s $64 ques- 
tion. Traub paid up, and Wozniakowski 
announced his solution in the Bulletin of the 
American Mathematical Society. 

For many problems, that would be as far as 
things would go. But not for this one. Three years 
later, Traub got a call from Irwin Vanderhoof, a 
professor of finance at New York University. 
Vanderhoof had read a _ news story on 
Wozniakowski’s result, and had a practical appli- 
cation in mind. 

Investment banks, it turns out, calculate multi- 
variate integrals by the truckload. With 
WoZniakowski’s result, Vanderhoof thought, the 

b work might be doable more quickly and accurate- 

ly, and therefore, presumably, more profitably. 

Vanderhoof put Traub and his colleagues in touch with analysts at 

Goldman Sachs, one of Wall Street’s best known financial houses. 

Recent results the Columbia researchers have obtained indicate 
their methods may pay huge dividends. 

Before plunging into financial matters, though, what is a multi- 
variate integral, what was Traub’s question, and how did 
Wozniakowski solve it? 

Multivariate integrals arise in calculus of several variables. In 
elementary calculus, a continuous function (of a single variable) 
describes a curve in the plane. If a function f(x) is positive 
between two values of x, say a and b, then the integral rt f(x)dx 
is, essentially by definition, the area beneath the curve (see 
Figure 1). When divided by the length of the interval, the integral 
can be interpreted as the average value of the function. In particu- 
lar, when the interval runs from 0 to 1, the integral is the average 


SN 


value of the function. 

Multivariate integrals have a similar interpretation. If a contin- 
uous function f(z, 72,... ,2q) is defined on the “unit cube,” in 
which each variable takes on values between 0 and 1, then the inte- 
gral ry vee rs f(a1,... ,£q)dx---dxzq can be viewed as the aver- 
age value of the function. 

That interpretation suggests one way to approximate an integral: 
Pick a bunch of points at random, evaluate the function at those 
points, and compute the average of those function values. The 
result is only an approximation to the integral, but theorists have 
shown that if you sample the function at NV ran- 
domly chosen points, the expected error will equal 
some constant multiple of 1/WN (the constant 
depends on the amount of variation in the function). 
Phrased differently, if you want to be confident that 
your error is less than € (mathematicians’ favorite 
symbol for a small quantity), then the number of 
sample points you’ll need is proportional to 1/ oe 

This random-sampling approach is the most 
commonly used method for evaluating complicat- 
ed, high-dimensional integrals. It is known as the 
Monte Carlo method. (The name was coined in the 
1940s by Nicholas Metropolis, a physicist at the 
Los Alamos National Laboratory. The Monte 
Carlo method applies more generally to a class of 
algorithms that use random numbers as a computa- 
tional aid. The idea was first proposed in 1946, by Stanislaw Ulam, 
also at Los Alamos, as a quick way of computing chain reactions in 
nuclear fission.) 

For some problems the Monte Carlo method makes obvious 
sense. But why use random numbers for multivariate integration? 
A perfectly sensible-sounding approach is to evaluate the function 
at regularly spaced grid points. This is what calculus texts always 
show for the one-dimensional case: To approximate an integral 
numerically, one first divides the interval into a large number of 
segments of equal width, say n segments of width 1/n. One then 
evaluates the function at, say, the right-hand endpoint of each seg- 
ment, adds up all these function values and divides by n. In sym- 
bols: bs f(x)dz = =(f(1/n)+---+f(n/n)) (see Figure 2). 
The d-dimensional analog is to form a regular grid by chopping the 
unit interval in each direction into n segments. Again, the integral 


Figure 2. Approximating an integral 


with a grid of equally spaced points. 
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is approximated by summing the values of 
the function at the grid points and dividing 
by the total number of grid points, which is 
n! (see Figure 3). 

It sounds simple, but there are two prob- 
lems. First, even in the one-dimensional 
setting, there’s no guarantee the answer will 
be anywhere close to correct: No matter 
how large n is, the “worst case”’ error is 
arbitrarily large (see Figure 4). In fact, no 
numerical algorithm can avoid this prob- 
lem; it is a property of continuous func- 
tions. 

y This difficulty can be ducked by assum- 
ing that the functions to be integrated are 
not only continuous but have some degree 
of smoothness. However, that assumption 
is often too restrictive. Another way to get 


around the difficulty is to look at average- 
case errors rather than worst-case errors. 
Functions that produce large errors in 
numerical integration are, in a certain tech- 
nical sense, rare: They lie in the unlikely 
extremes of a probability distribution 
defined on the space of continuous func- 
tions. This distribution (known as Wiener 
measure) is analogous to the familiar bell- 
shaped curve of elementary probability and 
statistics (see Figure 5)—the thick middle 
of the distribution is occupied by well- 


Figure 3. An integral over the unit 
square in two dimensions can be inter- 
preted as the average height of the sur- 
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behaved functions, while their wilder, 
error-prone kin lie in the outer tails. In the average-case setting, 
numerical integration based on a uniform grid of sample points 
makes perfectly good sense: The expected error is approximately 
equal to the spacing between grid points, i.e. 1/n. 

That’s where the second, more serious problem arises: To main- 
tain a given level of accuracy, the uniform-grid approach requires 
an amount of computation that increases exponentially with the 
number of variables. Suppose, for example, you need three digits 
accuracy (on average) for your numerical integrations. For a one- 
dimensional problem, your sampling grid would need a thousand 


points. But a two-dimensional problem requires a 
grid with a million (= 1000 x 1000) points, a three- 
dimensional problem requires a billion points, and a 
problem with a mere 10 variables requires more grid 
points than most people have patience for—even if 
you could evaluate your function at a hundred mil- 
lion grid points per second, it would take over three 
hundred trillion years to finish the job. And things 
get worse as you demand more accuracy: A hun- 
dred million calculations may be worth doing for 
eight decimal places accuracy in a one-variable 
problem, but doing ten quadrillion (10'°) calcula- 
tions for a two-variable problem probably isn’t 
worth the wait. 

The financial calculations that Traub and 
WoZniakowski were to learn of from Vanderhoof 
call for integrations of functions with 360 variables—one for each 
month of a 30-year mortgage. To achieve even one-digit accuracy 
by the straightforward approach would take 10°°” calculations, an 
absurd number even for a mathematician. 

Exponential increase in the amount of computation required for 
an answer plagues many algorithms in computer science. When it’s 
associated with the number of variables in a problem, as for multi- 
variate integration, theorists call it the “curse of dimensionality.” 

The Monte Carlo approach is one way to break the curse of 
dimensionality for multivariate integration: The amount of compu- 
tation it requires is proportional to 1/e¢?, regardless of the number 


Figure 4. Functions with regularly 
spaced zeros are quite common. 
Regularly spaced grid points can pro- 
duce a bad estimate for the integral of 
such a function. 


Figure 5. The bell- 
shaped curve. 
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(0, 1) 


Figure 6. The shaded rec- 
tangle with one corner at 
(1, 1) contains 4 of 10 “ran- 
dom” points in the unit 
cube, or 40%. The average 
percentage for all such rec- 
tangles is called the dis- 
crepancy for the given set of 
points. 


(0, 0) 
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of variables. But the method has its drawbacks. One is the 
unavoidable risk of getting a completely wrong answer, no matter 
how much analysis you do. It can also be somewhat unsettling to 
get a different answer each time you do a calculation. The chal- 
lenge Traub posed was to find a deterministic alternative to Monte 
Carlo. 

Theorists had already proved that such an alternative exists. In 
essence, the effectiveness of random sampling in the Monte Carlo 
method guarantees the average effectiveness of some deterministic 
sample. Unfortunately, the existence proof gave no hint as to 
which deterministic sample to use. Traub offered “big bucks’’ for 
an answer: an explicit algorithm for multivariate integration that 
broke the curse of dimensionality in the average-case setting. 

Enter Wozniakowski. He had been thinking about the ef- 

fects of using pseudo- 
ae random numbers in 
algorithms that rely on 
randomness, including 
the: » Monte: ‘Carlo 
method for multivariate 
integration. He came 
to.-realize: ‘that: he 
could translate the 
average-case error 
problem into another 
problem, known as the 
discrepancy problem. 
Roughly speaking, the 
discrepancy of a set of 
oy sample points in the 
unit cube is a measure of how uniformly dispersed they are. It is 
computed by comparing the fraction of sample points that appear in 
certain boxes within the unit cube (see Figure 6), averaged over all 
such boxes. What WoZniakowski realized is that the discrepancy of 
a set of points equals the average error when those points are used 
to approximate multivariate integrals. Traub’s problem, therefore, 
amounted to identifying sets of low discrepancy. 

That seems, of course, only to trade one hard problem for anoth- 
er. But WoZniakowski was in luck: The discrepancy problem had 
already been solved. In two papers written 26 years apart, Klaus 
Roth at Imperial College in London proved that the smallest possi- 


ble value for the discrepancy of N 
points is asymptotically proportional 
to N~!(log N)(4-))/2, The first paper, 
published in 1954, established the esti- 
mate as a lower bound; the second 
paper, from 1980, proved the estimate 
to be sharp. 

Theorists have subsequently identi- 
fied many sequences of sample points 
with low discrepancy, typically of 
order N~! ( logV)*, which is close to 
the bound obtained by Roth. These 
sequences are named after the people 
who designed them. So Wall Street 
is starting to hear such surnames 
as Faure, Halton, Hammersley, 
Niederreiter, and Sobol (see Figure 
7). Curiously, the construction of low- 
discrepancy sets has close connections 
with number theory. The Hammersley 


Figure 7. An example of 128 randomly 
chosen points (top) and 128 Sobol 
points (bottom). 
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Time is money on 
Wall Street, so 
financial firms 
have an interest in 
finding good, 
quick algorithms. 
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points, for example, are defined in terms of the first d prime num- 
bers—about as far from mundane matters as one ever gets. 

So what does any of this have to do with finance? 

The answer is derivatives. To students of calculus, where a 
derivative measures the slope of a curve at a point, that may just 
sound like more mathematics. But in finance, a derivative is a dif- 
ferent beast. Roughly speaking, it’s any financial instrument whose 
value is derived from the value of something else. “Call options’”’ 
are one simple example. Imagine a stock that now sells for $10 a 
share, with a reasonable expectation of doubling value within a 
year. Someone (such as a bank or investment company) with a 
large holding of the stock might wish to raise some immediate cap- 
ital by selling investors the option to buy the stock at $15 per share 
at any time in the next year. If the call option costs, say $1 per 
share, then an option holder can realize a profit 
as soon as the stock goes over $16. If indeed the 
stock goes to $20 within the year, the holder can 
realize a 500% return on his or her $1 invest- 
ment. The option seller, on the other hand, gets 
both some immediate cash and a hedge against 
the chance that the stock might not live up to 
expectations. 

A more complicated example is called a col- 
lateralized mortgage obligation, or CMO. A 
CMO is a bundle of loans, generating cash flows 
from interest payments and repayments of principal. The cash 
flows vary with fluctuations in future interest rates, especially when 
lower rates induce people to prepay loans, perhaps by refinancing. 
A typical CMO consists of 30-year mortgages with monthly pay- 
ments. That’s 360 cash flows in all, depending on 360 interest 
rates—in mathematical terms, 360 variables. 

The trick is to compute an “expected present value” of the 
CMO, averaged over all possible fluctations of the 360 interest 
rates. Parameterized by the probabilities with which fluctuations 
occur, this calculation can be formulated as a multivariate integral 
over the 360-dimensional unit cube (all probabilities lie between 0 
and 1). The function being integrated is not especially complicat- 
ed, conceptually; it combines simple assumptions about the way 
interest rates fluctuate and the way prepayments are made with 
standard algebraic calculations of discount factors and annuities. 
(Some technical complications arise from rules that divide the cash 


flows into separate streams known as tranches.) Evaluating the 


function, however, is a tedious and time-consuming task, even for 


a fast computer—each evaluation requires upwards of a hundred 


thousand floating-point operations. Time is money on Wall Street, 


so financial firms have an interest in finding good, 
quick algorithms for approximating multivariate 
integrals with as few function evaluations as possi- 
ble. 

That’s where low-discrepancy algorithms come 
in. Wall Street is already a big consumer of Monte 
Carlo calculations, which promise an error propor- 
tional to 1//N from a random sample of N points, 
with a constant of proportionality independent of 
the dimension d. But low-discrepancy algorithms 
promise an error proportional to (log N)“/N based 
on a deterministic sample of N points. For suffi- 
ciently large N, this error is smaller than that of the 
Monte Carlo method. In other words, a determinis- 
tic method may provide high accuracy with less 
computation than a random-sampling approach. 

To be fair, the notion of errors are different for 
the two approaches, so the results aren’t strictly 
comparable. Moreover, the deterministic error’s 
factor of (log N)?/N indicates a potentially rapid 
degradation as the number of variables increases. 
For d=360 and N=10°, for example, 
(log N)“/N ~ 107, which is all but useless as an 
error bound. Thus, conceivably, the low-discrepan- 
cy method could outperform the Monte Carlo 
method only when an absurdly high level of accura- 
cy is required. Still, it seemed worth trying low-dis- 
crepancy algorithms in the hope that the actual error 
in practical problems would be much much smaller 
than the bound suggests, and perhaps even smaller 
than the Monte Carlo error. 


Spassimir Paskov as a graduate stu- 
dent at Columbia University (top), and 
now as a risk analyst at the Manhattan 
branch of the Union Bank of 
Switzerland (bottom). (Photo courtesy 
of Spassimir Paskov.) 


That was Vanderhoof’s intuition, at least. “My interest in 


[WoZniakowski’s result] was immediately the question of handling 


derivatives,” he recalls. After several phone calls, he sent the 


Columbia researchers a simplified, “toy” financial problem. 


Spassimir Paskov, then a graduate student in computer science, 


implemented low-discrepancy algorithms in a program he called 
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‘““We’re just at the 
beginning.” 


—Joseph Traub 
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FINDER (for FINancial DERivative). Paskov (who received a 
Ph.D. in 1994, and has now traded in his student sweatshirt and 
jeans for a three-piece suit appropriate for his new job as a risk ana- 
lyst at the Manhattan branch of the Union Bank of Switzerland) 
demonstrated the power of the low-discrepancy algorithm using 
Sobol points on the toy problem. At this point, Vanderhoof put the 
Columbia group in touch with the investment firm Goldman Sachs, 
which gave the computer scientists a full-fledged CMO, including 
proprietary information, to work on. 

“What happened was really a surprise,’ says Traub. The 
researchers were expecting mixed results, with their new algorithm 
outperforming the Monte Carlo method on some parts of the prob- 
lem and falling behind on others. Instead, says Traub, “in every 
instance the deterministic method beat Monte Carlo for these high- 
dimensional integrals, absolutely contrary to the conventional wis- 
dom.” 

According to Paskov, his computer runs show the low-discrep- 
ancy algorithm beating Monte Carlo on three different counts: 
speed, accuracy, and confidence (meaning that the deterministic 
answer rarely strays from the “true” value of the integral by much). 
For example, the method using Sobol points always terminates two 
to five times faster than the Monte Carlo method, often with small- 
er error (see Figure 8). 

Nobody knows why the results are so one-sided. “It’s a very 
interesting theoretical problem,” notes Vanderhoof. WoZniakowski 
conjectures that the advantage is due to a property of the function 
being integrated. Namely, because of the discount factor in com- 
puting the present value of cash flows, the variables representing 
interest rates in the distant future are less important than “short- 
term”’ variables; this lowers the effective dimension of the problem. 
But there’s no mathematical theorem yet. With problems like that 
still open, says Traub, “‘we’re just at the beginning.” 


Percentage Dimension = 360 
error 


— Sobol 
— Random 


1,000,000 sample points 


Figure 8. A comparison of Monte Carlo and low-discrepancy algorithms for a financial calculation. The deterministic 
method using Sobol points (blue curve) gets an accurate answer with far fewer function evaluations than any of three 


runs using the Monte Carlo method (black curves). (Figure courtesy of Joseph Traub and Simon Baker, Columbia 
University. ) 
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